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RESPONSE ERRORS IN SURVEYS 


Morais H. Hansen, Wiiu1am N. Hurwitz, 
Eu1 8. Marks, anp W. Parker MAvuLpDIN 


Bureau of the Census 


y “ERROR” in a survey result is meant the difference between a 
B survey estimate and the value which is estimated. Such error may 
arise from a variety of sources. In sample surveys there is, of course, 
random sampling error and perhaps also sampling bias. Many surveys 
also involve non-sampling errors, i.e., errors which would be present 
even in a complete count. Non-sampling errors may be introduced into 
a survey either in the collection of the data or in their processing. We 
shall use the term “response error” to designate non-sampling errors 
introduced during the course of data collection. 

Response errors may be due to the questionnaire design, the inter- 
viewing approach, the characteristics, attitudes, or knowledge of the 
respondent, or a great many other causes. Regardless of the source, any 
systematic attempt to control or measure response errors must be based 
on a clear formulation of the way they arise. A listing and description of 
types of errors in surveys has been given by Deming [2]. Mahalanobis 
[4] has developed several important techniques for measuring and con- 
trolling response errors, particularly those arising from the interviewer. 

The control of processing errors can be dealt with in a way similar 
to the control of response errors. For example, the discussion below of 
controlling “interviewer variance” is applicable to the control of “coder 
variance” or “puncher variance.” There is one important difference be- 
tween processing and response errors: finding and correcting processing 
errors is inexpensive (relative to corresponding costs for response 
errors). Thus, control of processing errors can usually be achieved 
through verification procedures, while control of response errors may 
require high initial accuracy. However, comparison of methods used for 
processing error control with methods for response error control may 
be of significance for future work in both fields. 


147 
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One important feature of all surveys is the estimating procedure. 
The processes of sampling, data collection, coding, punching, and 
tabulating introduce “errors” into survey results. These errors may be 
affected by the choice of an estimating procedure. The present paper 
does not attempt to deal with the choice of estimating procedures or 
with the control of processing errors. The goal of the present analysis 
is the explicit formulation of a mathematical model for “response 
errors.” An essential preliminary to such a formulation is a determina- 
tion of some of the important requirements that a mathematical model 
should meet in order to make it conform reasonably well to actual 
survey conditions. 


SOME REQUIREMENTS OF A MATHEMATICAL MODEL 
FOR RESPONSE ERRORS 

In defining “error” we have referred to an “estimate” and to a “value 
estimated.” The “estimate” is some value determined from the survey 
data and, for any particular survey, is a definite number but varies from 
survey to survey. In many surveys, the “value estimated” is not de- 
fined explicitly and the problem of survey design is complicated by 
vagueness regarding what is being measured. However, if the aim is 
orderly planning of a survey rather than catch-as-catch-can methods, 
it is essential that the “value estimated” be defined precisely. 


ESTIMATING AN AVERAGE OR AGGREGATE 


The most common type of “value estimated” in social surveys is one 
which is an average or an aggregate of the individual values that make 
up the population. Here we are dealing with a population of elements 
such that each element has attached to it some value of a variable, and 
we want to know the average or the aggregate of some or all of these 
values. For the present, we shall consider only the case in which we are 
estimating an average or aggregate of all the population elements. 

The case in which we are trying to estimate the average or aggregate 
for some population subgroup can also be handled using the mathe- 
matical model developed in this paper. A comprehensive treatment, 
however, would require analysis of the effects of response errors on 
ratios of random variables. This involves correlations between the er- 
rors in the two variables and correlations between the errors and the 
variables which affect both the variance and the bias of the estimate. 
Response bias usually will be present in estimates of subgroup means 
even when estimates of the population mean are unbiased, unless the 
errors are independent of each other and of the variables. Where errors 
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of measurement have been considered in correlation analyses, the situa- 
tions dealt with have been restricted to the case where errors are inde- 
pendent and “attenuate” the correlation. The effect of correlated errors 
requires much more attention than it has been given. This topic will 
be the subject of another paper. 

In a sample survey to estimate a population aggregate or average, 
we observe the values of some of the population elements and derive 
the estimate from these observed values. The fact that we have selected 
for observation some but not all of the elements ordinarily introduces 
some error (sampling error). In addition, we frequently find that there 
are response errors in the individual observations. Thus, even if we 
were to observe all the elements of the population (i.e., take a census), 
we will usually have an error in our estimate of the population average 
or aggregate. 

It should be noted that errors of non-response play a peculiar role. 
Failure to secure a response can be considered as a sampling bias since 
the “non-response” elements have a zero chance of inclusion in a sam- 
ple. Failure to secure a response can also be considered as a response 
error, since any estimating procedure involves assigning values to the 
non-response elements either implicitly or explicitly; e.g., estimating 
the population average on the basis of the respondents alone is equiva- 
lent to assigning each non-response element a value equal to the 
estimated average. Since non-responses occur in censuses as well as in 
sample surveys, it is probably best to consider non-responses as re- 
sponse errors.' 


THE Concept or “INDIVIDUAL TRUE VALUES” 


In defining the “value to be estimated” we shall define a true value 
for each of the individuals who make up the population and the value 
to be estimated is an average or aggregate of these individual true 
values. The individual true value will be conceived of as a characteristic 
of the individual quite independent of the survey conditions which 
affect the individual response. Thus, age is usually defined as a time 
interval between two events and this definition is quite independent of 
how we determine an individual’s age. It should be remembered, how- 
ever, that the number you get when you ask a person his age (or the 





1 The method for handling non-responses outlined by Hansen and Hurwits [3] is a special case of the 
methods presented in the present paper. One difference should be noted. The earlier approach con- 
sidered a non-response as absolute for a given data collection technique (i.e., assumed non-respondents 
of a sample would also be non-respondents in a census using the same technique); the present approach 
assumes a probability of non-response for each individual (with absolute non-response as a special case). 
This difference does not, however, make any essential difference in the results. 








150 AMERICAN STATISTICAL ASSOCIATION JOURNAL, JUNE 1961 


age of his wife or brother-in-law) is not necessarily the true value for 
the age as defined. The respondent may not know his “true” age. Quite 
frequently, he does not know exactly the age of his wife or others for 
whom he may report. Even if he does know the correct answer, he may 
misunderstand the question or become confused in “recall” or he may 
purposely give an incorrect answer. 


DIFFICULTIES OF ASCERTAINING INDIVIDUAL TRUE VALUES 


In the case of some variables (e.g., age or sex) a survey may get the 
true values for a’ large proportion of the individuals. In the case of 
other variables (e.g., income, brand preference or purchase) the true 
values may be obtained for a much smaller proportion of the popula- 
tion. A survey rarely gets the true values for all the individuals, regard- 
less of the variable measured. Frequently, by a sufficient expenditure of 
well-directed effort we can approach the true value. We could, for 
example, try to determine age by examination of birth or baptismal 
certifications, or of primary school records where no birth certificate 
exists, or of the first decennial census in which the individual was listed 
if neither birth certificate nor primary school records exists. Exhaustive 
record searches might give the true age for most individuals, although 
there would obviously be persons for whom we could find no records 
and other individuals whose records are in error. The searches would, of 
course, be expensive compared with methods ordinarily used for de- 
termining age. 

There are cases where the true value for the population is known even 
though the individual true values are unknown. In a study of voting be- 
havior in the 1948 presidential election, the true value could be defined 
as the way an individual’s vote was actually counted in the election. 
If no individual voted more than once and no vote was counted more 
than once, each individual in the population could be assigned one of 
four values (vote counted for Truman, vote counted for Dewey, vote 
counted for some other candidate, vote not counted for any candidate). 
The number of cases in the population could be determined for at least 
the first three categories even though we could not determine the true 
category for any particular individual. 


CRITERIA FOR A DEFINITION OF TRUE VALUE 


There are many cases in which we might encounter tremendous diffi- 
culty in defining a “true” value (entirely apart from the problem of 
determining the value once we have defined it). What, for example, is a 
person’s “true intelligence,” “true attitude toward revision of the Taft- 
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Hartley Act,” or “true brand preference for cigarettes?” No definitive 
answer can be given to these questions. We would suggest, however, 
three criteria for the definition of “true value” (the first two essential, 
the third, useful but not essential) : 


1. The true value must be uniquely defined. 

2. The true value must be defined in such manner that the purposes 
of the survey are met. For example, in a study of school children’s 
intelligence, we would ordinarily not define the true value as the 
score assigned by the child’s teacher on a given date although this 
might be perfectly satisfactory for some studies (if, for example, 
our purpose was to study intelligence as measured by teacher’s 
ratings). 

. Where it is possible to do so consistently with the first two criteria, 
the true value should be defined in terms of operations which can 
actually be carried through (even though it might be difficult or 
expensive to perform the operations). 


It is possible to define true value so that a survey is subject to no (or 
negligible) response error. It will be useful to consider such definitions 
of “true value” in the light of the criteria listed above. For example, we 
could define a person’s “attitude toward revision of the Taft-Hartley 
Act” as the alternative (in a set of 6 alternatives) that he first selects 
after an accredited interviewer for the survey has asked him: “What do 
you think of the Taft-Hartley Act?” We could define a person’s birth- 
place as the answer recorded for him by an interviewer who is in- 
structed to ask: “In what state or foreign country were you born?” 
These definitions meet (or, with a little expansion, can be made to 
meet) 2 of the 3 criteria: they are unique and are defined in terms of 
operations which can be carried through. In most cases, however, they 
will not be acceptable as “true values.” There might, perhaps, be a 
survey director who would accept these definitions as the things he 
really wants to measure, but most consumers of data are after some- 
thing less dependent on the particular interview conditions (even 
though results of this type may be quite acceptable as approximations 
to the true value). We may want to know how a person is likely to act 
toward a Congressman who favored or opposed the Taft-Hartley Act, 
not what his casual reply is to a rather vague question asked by a 
person whose motives and sponsorship may generate a very complex 
reaction in the respondent. We may want to know where a person was 
actually born, not what gets recorded as his birthplace when the inter- 
viewer fails to ask the question: properly, or the respondent misunder- 
stands the question, or the interviewer misinterprets the answer. 
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UssE oF AN ExpecTepD RESPONSE VALUE TO 
APPROXIMATE THE TRUE VALUE 


In the examples cited (and in many other cases) it may be impossible 
to define a true value which meets all of the three criteria listed. Often, 
however, we can define a value which meets the first two criteria and 
can at least define an operation whose “expected value” will give a 
satisfactory approximation to the true value. An example of this tech- 
nique is provided in a study by the Bureau of the Census in which a 
population census was taken of four counties. After the census was 
completed by the temporary personnel hired as enumerators, perma- 
nent staff members of the Bureau recanvassed each “enumeration dis- 
trict,” taking with them a record of the original enumeration, looking 
carefully for persons missed in the original enumeration and checking a 
sample of those persons who were enumerated in the area to make sure 
that they should have been enumerated. The individuals who did the 
recanvass were (in general) well-trained, conscientious, and thoroughly 
familiar with the rules that prescribe which persons are to be enumer- 
ated in a given enumeration district. The recanvass procedure did not, 
of course, insure a “perfect” measurement for each individual but it 
came nearer to doing so than the procedure used originally. 

Consider interviewing each individual a large number of times under 
exactly the same conditions as the recanvass. This would yield a popu- 
lation of responses for all individuals. We can then draw a sample of 
individuals and one of the possible responses from each of the indi- 
viduals in the sample. The expected value of an estimate from this 
sample could be regarded as approximating the true value. For a 
reasonably large set of such observations, the aggregate results esti- 
mated from the recanvass would be close to the “true population 
count.” 


THE CoNcEPT OF INDIVIDUAL RESPONSE ERROR 


The term “individual response error” will be used here to denote the 
difference between an individual observation and the true value for the 
individual. For example, the survey might want age as of last birthday 
as a difference in whole years between date of birth and some specified 
date (say, April 1, 1950). If in 1950 one of the persons covered by the 
survey was born April 1, 1897 but is reported as 50 years old, the “indi- 
vidual response error” would be —3 years. 

A less obvious case of response error is the failure to report an indi- 
vidual in a census of population. Here the “true” value (the value the 
census is trying to obtain) is 1 (1 person), the obtained value for this 
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individual is 0, and the response error is —1. Similarly, counting an 
individual twice would be an error of +1. 


VARIANCE AND Bias OF RESPONSE ERRORS 


As here defined, an “individual response” is the value obtained on a 
partieular observation (e.g., the result obtained in a specified measure- 
ment or interview by a specified interviewer with a specified respondent 
at a given time). Under slightly different conditions, therefore, the 
value of the individual response might be different. Thus the indi- 
vidual response is influenced by the conditions of the observation or 
interview or written response. 

The variability of individual responses has usually been treated in 
terms of random variation. While this approach has certain defects, 
we shall adopt it for purposes of the present analysis. Consequently, 
the response error of a particular individual in a given survey will be 
thought of as having an expected value (the individual response bias) 
and a random component of variation around that expected value. 
Similarly, the aggregate or average of a set of responses for different 
individuals will have a response bias and 2 response variance which 
will be determined by the aapenen biases and variances for the popula- 
tion of individuals. 


EssENTIAL CONDITIONS OF A SURVEY 


To say that an individual response is a random variable is not, how- 
ever, sufficient; we must define somewhat more precisely the universe 
of individual responses involved. For this purpose we shall consider all 
responses obtainable under certain “essential” conditions. In general, 
these conditions are “specified” (either implicitly or explicitly) by the 
survey design. As a minimum, a survey design must specify the subject 
of inquiry, the method of obtaining information (interview, mail in- 
quiry, direct observation, etc.), and the method of recording the in- 
formation (i.e., checking a box, entering a figure, writing a description 
of the response, etc.). These specifications may be general or specific. 

Particular surveys may involve additional specifications, e.g., that 
the survey be taken during a particular period. There are also “essen- 
tial” conditions of a survey which arise implicitly as necessary conse- 
quences of the explicitly specified conditions. For example, if we specify 
that a survey of individual income received during 1949 be taken during 
April 1950, there is implicit in that specification a certain “recall” 
situation for each respondent and a relationship of this “recall” situa- 
tion to income tax filing activities. If we also specify that responses be 
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obtained by interview, the fact that the survey is to be done in April 
1950 implicitly specifies a certain condition of the labor market and this 
may impose restrictions on the type of interviewer obtainable. The 
compensation paid and training given to interviewers, the wording of 
questions to be asked, the sponsorship of the survey, are frequently a 
part of the specified survey conditions and these specifications deter- 
mine, in turn, other conditions which will distinguish this response 
situation from other response situations. 

On the other hand, there are usually present at the time of any re- 
sponse, conditions which may affect that response but which are neither 
specified survey conditions nor the direct consequences of specified 
survey conditions. If the survey design specifies the types of inter- 
viewers, the sponsorship of the study, the compensation offered, and 
the hiring procedures used, these specifications may make it certain 
that John Jones will be interviewed by one of a certain class of indi- 
viduals (e.g., persons over 30 years of age who have had at least two 
years of high school education and some experience as interviewers for 
other surveys), but the exact identity of the interviewer may still vary 
within the limits of the specified class. The survey design may instruct 
the interviewer to ask certain questions, but it cannot insure that the 
questions are always asked in exactly the same way. The survey design 
may specify a certain approach to respondents, but it will not specify 
how that approach will be received by a respondent who happens to be 
interrupted while she is doing the family laundry. 

In general, the survey specifications (explicit or implicit) restrict the 
range of response variation but by no means eliminate variation com- 
pletely. Under some conditions, the range of variation will be narrow; 
under others it will be wide. Similarly, the response errors may be 
compensating in character or they may be more or less systematic in 
direction, thus creating a response bias. The expected value of the re- 
sponse errors, and the random component of variation around that 
expected value, may be regarded as determined by the essential survey 
conditions. 

In practice, some of the essential conditions of a survey will be diffi- 
cult to separate from the unessential ones, but the fact that some are 
essential and others are of an accidental character needs to be recog- 
nized. Often the problem of improving survey design will be to identify 
and deal with some of the more important essential conditions. 


CORRELATION OF RESPONSE ERRORS WHEN 
INTERVIEWERS ARE USED 


It would be convenient to assume that in any particular survey the 
random component of the response error for one individual is un- 
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correlated with the random component of the response error for another 
individual. Such an assumption, unfortunately, does not accord with 
known facts about response variation. In particular, a mathematical 
model which postulates independent responses of all individuals will 
not fit a survey which uses interviewers, unless the interviewer is as- 
sumed to have no influence on the response. 1! we were to assign at 
random a different interviewer to each individual, the effect of the 
interviewer on the response would be uncorrelated for any two re- 
sponses. Ordinarily, however, a given interviewer obtains and records 
the responses for a number of individuals and we have reason to believe 
that the errors made by a particular interviewer are positively corre- 
lated. Even casual observation of an interviewer at work reveals the 
presence of interviewing patterns distinctive of that interviewer. In an 
inquiry about labor fdrce status, an interviewer who implies by his 
manner that he does not expect to find housewives gainfully employed, 
will tend to record fewer employed women than an interviewer who 
seems to insist that every adult should be gainfully employed. 

The present analysis uses a mathematical model which assumes that 
responses are uncorrelated if they are obtained for different individuals 
by different interviewers. There may be correlation between responses 
even when both the individual and interviewer are different. For ex- 
ample, the presence of a common supervisor or participation in the 
same training class may result in correlated errors for two different 
interviewers (unless these common influences are specified as essential 
conditions). We shall assume that these correlations are small and can 
be neglected, although the model could be extended to include them. 


SPECIFICATION OF A MATHEMATICAL MopEL 


The discussion thus far presented leads to a mathematical model for 

the analysis of response errors in which we have: 

(a) A population of N individuals and a population of K interviewers. 

(b) Associated with each individual, a true value. 

(c) A set of essential survey conditions which determine for a par- 
ticular individual and interviewer the expected value of a random 
variate. 

(d) Zero correlation between the random component of responses for 
two different individuals with two different interviewers. 


In many surveys, interviewers are available to interview only certain 
classes of the population and only in certain geographic areas. We shall, 
therefore, conceive of our interviewers as divided into L groups with 
K« interviewers in the A-th group who are available to interview a 
particular N4 individuals and no others. Where all interviewers are 
available to interview all individuals, L=1; Ka=K; Na=N. 
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THE EFFECT OF INTERVIEWERS ON THE VARIANCE 
OF SAMPLE ESTIMATES 


EFFECT OF RESPONSE ERRORS ON ESTIMATES OF SAMPLE VARIANCE 


One major advantage possessed by probability sampling, as com- 
pared with the other types of sampling, is the possibility of estimating 
the sampling error from the sample. In situations where the sample 
results are uniquely determined by the act of selecting the sample indi- 
viduals (i.e., given the fact that the 7-th individual is in the sample, 
one and only one value is to be ascribed to the variate for the 7-th 
individual) there is, of course, no question of our ability to estimate 
sampling error from a reasonably large sample. 

When the individual responses are subject to error, we shall see that, 
with appropriate methods, the sampling variance of a statistic such as 
a mean or total will reflect the response variation, as well as the error 
due to including only a sample of individuals. Appropriate analysis of 
the sources of error will point to the methods for minimizing the total 
variance. However, while the use of probability sampling will insure 
that the variance of the individual true values will be appropriately 
reflected in the variance of a sample mean, the accurate reflection of 
response variance will depend on the applicability of whatever mathe- 
matical model is assumed. 

The response bias of a statistic such as an estimated mean or total 
will not be reflected in the variance of a sample statistic, although its 
effect, if it can be estimated, will be reflected in the mean square error 
and its effect on accuracy thus taken into account. Response bias is 
not per se a “sampling” problem, i.e., bias arising from response errors 
is, in general, independent of the sample design and is, in fact, of the 
same magnitude for a study involving a complete canvass of the popu- 
lation as it is for a sample survey if they are taken under the same es- 
sential conditions. Postponing to a later section the consideration of 
response bias, we shall examine first the other component of survey 
error, i.e., the variance of a sample estimate, and shall examine par- 
ticularly the contribution of the interviewer to this variance. 


Tue DEsIGN of A SURVEY TO EVALUATE RESPONSE 
VARIANCE DUE TO INTERVIEWERS 


For a discussion of sampling variance we must consider, of course, 
some particular technique of drawing a sample and making an estimate 
from this sample. In studies which involve the use of interviewers, we 
must consider also some specified technique for drawing the inter- 
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viewers and assigning them to the various individuals included in the 
sample. 

Actually, survey practice in making interviewer assignments is far 
from standard. A common pattern is to sort out the cases drawn in the 
sample by geographic areas and then to assign the cases in a given area 
to one or more interviewers, making the different interviewers’ assign- 
ments approximately equal. The cases drawn may be individuals or 
clusters but, in either event, in surveys in which interviewers are used, 
costs of travel and time required for identification of the sample usually 
suggest some clustering of the assignments to interviewers. This clus- 
tering of interviewer assignments led to the introduction of interviewer 
groups into the population specifications outlined above. In terms of 
the specified mathematical model, this practice can be described as 
follows: 


(a) n of the N individuals in the population are selected at random 
without restriction.? 

(b) ka interviewers are selected at random without restriction from 
the A-th interviewer group to interview those sample individuals 
who are available for interview by this interviewer group. Let 


k=)oka be the total number of interviewers selected. 


A 
(c) The same number, %, of individuals is assigned to each of the ka 
interviewers. The # individuals assigned to any interviewer are a 
random subsample of all the sample individuals available for 
interview by this interviewer group. 


The applicability of these conditions to actual surveys will be con- 
sidered later. With certain restrictions on the definition of interviewer 
groups, the conditions stated apply reasonably well to many surveys. 

It should be noted that n4, the number of sample cases which will be 
available only to interviewers in the A-th group, is a random variable. 
In designing a survey we could decide to use a fixed number of inter- 
viewers from the A-th group and adjust the size of assignment given 
each interviewer. For example, if we were using two interviewers for a 
given group and happened to draw 84 sample cases available to this 
group, we could give each interviewer 42 cases; if we drew 76 indi- 
viduals, each interviewer would be assigned 38, etc. Another method of 
determining interviewer assignments is the one used here, i.e., to fix 





? We shall restrict this discussion to unrestricted random sampling. The results can be extended 
to cluster sampling by treating each cluster as an individual (dividing the sample mean, of course, by N, 
the average number of individuals per cluster, and the variances by N%). 
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the size of the assignments and let the number of interviewers vary, 
The restriction that the size of the interviewer assignment be fixed does 
not represent any great loss of generality, since the variance of most 
sample estimates will be about the same whether the size of assignment 
or the number of interviewers in a group is fixed. 


THE SAMPLE ESTIMATE AND Its MEAN SQuARE ERROR 


Let 
Yate = Value obtained for c-th sample individual by b-th sample in- 
terviewer in the A-th (population) group. 
he: « 
; Wy y Y Abc 
A 6 c 
j= = the sample mean. (1) 
n 





With the sample design specified, j would be used as an estimate of 
the true population mean X. The mean square error of 9 is: 


MS.E. 9 = R,? + o5?. (2) 
where 
R, = E(5) —X 
E(5) = the expected value of 9. 


The Appendix shows (pp. 178-183) the derivation of an expression 
for o;? as approximately® equal to: 
we Cy? — Gyr 4 oul ‘ (3) 
n k 
Here a, represents the “total variance” of individual responses 
around the mean of all individual responses in the population, i.e., it is 
the variance over all responses for a given individual to all interviewers 
and over all interviewers and interviewer groups, and g,r is the covari- 
ance between responses obtained from different individuals by the same 
interviewer (this covariance being taken within interviewer groups, 
since independent selections of interviewers are made from each inter- 
viewer group). If we divide the covariance o,r by the variance of re- 
sponses within interviewer groups, we have the intraclass correlation, 
i.e., the correlation between responses of different individuals for the 
same interviewer. Thus: 





* Assuming that N is large relative to n and that the interviewers used in the A-th group are a ran- 
dom sample from a potential infinite supply of such interviewers. 
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(4) 


p= (5) 


Cyw? 


where 
oyw’ = variance of responses within interviewer groups (taken over all 
responses of every individual to every interviewer in the 


group). 
2 = vari f ted for intervi 
oyp’ = Variance of expec responses for interviewer groups. 


The sampling design used is, in effect, to sample “clusters” of responses 
(the responses obtained by each of the interviewers) and, within sample 
clusters, to subsample # responses (such that no two responses are for 
the same individual). The similarity to cluster sampling can be seen 
more easily if we express o,” as: 


Cyw" CyR? 


ost = —— {1 + (a — 1)p} + ——- (6) 
n n 

In the expressions above, oys?/n represents the variance arising be- 

cause individuals were sampled independently of the interviewer 

groups. If we sampled individuals within interviewer groups or if we 

had only one interviewer group (L=1), cyz?=0 and o,w*=<¢,?, so that: 


oy” 
af ae 1 te ee: (7) 


This formula is identical with that for the variance of a sample mean 
when we draw k clusters of # elements each (or sample k clusters of 
equal size and subsample # elements from each cluster). There are, of 
course, differences from straight cluster sampling arising from the re- 
striction that we must sample only one response for any individual but 
the basic sampling principles are entirely analogous. 


EsTIMATES OF VARIANCE FROM THE SAMPLE 
From the sample we can obtain unbiased estimates of oy and o,? 
(see Appendix pp. 184-186). These estimates are, respectively: 
L k A ka L ka 
p > Dd (Gas — Fa)? =D DD (Yabo — Fav)? 
aka—-1ss he Wa. 
sr = - (8) 
k n(% — 1) 








e 
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L ka @ 
DY DX DX (yare — 9)? 
7 (n —k) syr 
n—1l n-—-1 k 





s,? = 


(9) 


>. Y Abe 


Jao = ———— = Average for b-th sample interviewer in 


n A-th group 

kA on ka 

, > Y Abe p> VAb 
b c b 


Ja = alae = sample average for A-th group 





kan A 


Thus, an unbiased estimate of ¢,? is 


kA 1 


EEE am - 9) 


(n—b) yr 
n(n — 1) (n-—1) k 


_ 





85 (12) 


CONTRIBUTION OF INTERVIEWERS TO THE VARIANCE 


It should be noted that the effect of using interviewers is to introduce 
into the variance of 7 a term involving the intraclass correlation within 
interviewers’ assignments. The usual technique for estimating the 
variance of j from a sample ignores this intraclass correlation. If we 
ignore the intraclass correlation within interviewers’ assignments, the 
estimate of the variance of § would be the first term of (12). It can be 
seen that the result will usually be an underestimate. If k=n, each 
interviewer interviews only one individual and there is, of course, no 
intraclass correlation. In this case, the customary formula for the 
variance is a good approximation.‘ On the other hand, the more 
individuals we assign to an interviewer, the more important it is to 
use (12) to estimate the variance of } rather than the customary esti- 
mating formula. 

Equation (12) is useful in indicating the effect of interviewer error 
upon the variance of a sample mean. Where there is no need for a 
separate estimate of s,z, (12) can be written: 





4 A study director would not ordinarily assign a single individual to an interviewer but, with cluster 
samples, a single cluster might be assigned to an interviewer. Where an interviewer is assigned to a single 
cluster, the variance between clusters will include interviewer variance, by the usual analysis. Where 
more than one cluster is assigned to an interviewer, we can estimate the interviewer variance separately 
by use of the methods given here. 
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Lb na—l1 *4 L 
pe ae Dd (Gav — 9)? D na(Ga — 5)? 
9 (6 erts 4 A (13) 
7 k(n — 1) n(n — 1) 





pe (Sav — 5)? 


89? = k(k = 1) ° (14) 





The covariance, oy, reflects the effect on response of the interviewer 
and the interaction between interviewer and respondent. If the observa- 
tion is of a type which permits a large effect of the interviewer on the 
response, oyr may be quite substantial. In many cases, however, oyr will 
be negligible. We might, for example, expect a large variance among 
interviewers for estimates of farm acreage under corn where the inter- 
viewer does the estimating by direct observation without measure- 
ment. Where the farmer furnishes the interviewer with information 
about the number of cattle on a farm, there might be little or no 
effect of the interviewer on the response and o,r might be negligible or 
zero. 


REDUCING INTERVIEWER CONTRIBUTION TO VARIANCE 


Where interviewer contribution to the variance is important, it may 
be possible to reduce this contribution significantly by training and 
adequately supervising the interviewers. Often, however, training of 
the interviewer beyond a certain point will have very little effect on 
interviewer variance. Instead of trying to reduce interviewer variance 
by additional training and supervision or using other (and usually ex- 
pensive) techniques to obtain greater interviewer uniformity, we might 
devote our attention to reducing the effect of interviewer variance on 
our final estimate. From (3) it will be seen that, for fixed values of oyr, 
the effect of interviewer variance on o,? decreases as we increase the 
number of interviewers. Thus, if cost were not a factor, maximum ac- 
curacy with this sample design would be obtained by assigning one 
individual to each interviewer.5 





§ This statement is subject to the condition that response bias and variance between interviewers 
remain fixed. Ordinarily, it will not be possible to make extreme changes in size of interviewer assign- 
ment without changing the response bias and interviewer variance, but the analysis is acceptable within 
reasonable limits of variation. 
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DETERMINING THE OptiMuM NUMBER OF INTERVIEWERS 


With the ordinary survey which has a fixed total budget, increasing 
the number of interviewers will increase costs and will require a reduc- 
tion of expenditures at some other point, e.g., reducing the expenditure 
per interviewer or per individual or reducing the number of individuals 
included in the sample. When the cost function is simple, optimum 
values of n and k can be determined by joint solution of the cost and 
variance functions. With complicated cost functions, the optimum 
values can be determined by trying various sets of values which satisfy 
the cost function and determining the set which gives the smallest 
mean square error. 

To simplify the analysis, we shall consider the case in which the cost 
is given by: 

C= Cyn + Cyrk (15) 
where 


C =Total budget available for field work on the survey, 
C,=Cost per individual, 
C,1= Cost per interviewer. 


With this cost function and the variance given by (3), the optimum 
values of n and k are: 


As/—— 16 
n= Ag/ (16 
out 


A Cur (17) 


C 
A= ‘ 
VC,(¢,? = yr) + VC yr1tyr 


Some ILLUSTRATIONS 








(18) 


Assuming that cost functions are known or can be roughly approxi- 
mated, application of the technique may be illustrated by data from 
two studies where interviewer assignments were actually randomized. 
For a satisfactory estimate of interviewer variance, however, we will 
need more interviewers than the numbers used in the studies men- 
tioned here. The interviewer variance estimates of Tables I and II be- 
low are based on a very small number of cases and are, therefore, quite 
unreliable. They are presented only for purposes of illustration. 
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The Indian Statistical Institute has pioneered in the design of sur- 
veys so as to make possible the evaluation of response variation associ- 
ated with the interviewer. Methods similar to the survey design de- 
scribed in this section have been used for some time by the Indian 
Statistical Institute to control and measure the effects of the “human 
agency.” Some of these techniques are described by Mahalanobis [4]. 
One such design was used in an inquiry to determine economic condi- 
tions of factory workers in an industrial area at Jagaddal. The entire 
area was divided into five subareas. Within each subarea five inde- 
pendent random samples of structures were selected for interview. 
Each of the five samples was assigned to a different interviewer, but 
the same five interviewers worked in all five subareas. 

This design is similar to the one described above. In this case all i in- 
terviewers are (presumably) available to interview the entire popula- 
tion, so L =1. There is a stratification within interviewer's assignments 
(the sampling by subareas). Results are presented on a “family” basis, 
although the sampling unit used was actually a structure. To simplify 
the use of Mahalanobis’ data for illustrative purposes, we shall ignore 
the stratification and clustering and treat the sample of families as if it 
were an unrestricted random sample of the population surveyed, the 
families being the individual members of this population. 

Mahalanobis made three studies in the Jagaddal area (in 1941, 1942, 
and 1945) all using approximately the same design. He also reports a 
study using a similar design (five subareas but only four interviewers) 
carried out in the Nagpur in 1942-43 by M. P. Shrivastava. Table I 
shows estimates of o,? and o,r for various characteristics made from the 
results of these surveys assuming an unrestricted random sampling de- 
sign. With a suitable cost function, these variance estimates can be 
used to determine the optimum number of interviewers. Suppose, for 
example, that C (the total survey budget) was $2000; that C, (the cost 
per family) was $2 and that C,; (the cost per interviewer for training, 
supervision, travel to the five areas to be enumerated, etc.) was $80. 
With these values (and the cost function C=C,n+Cy,zk) the optimum 
number, k, of interviewers and the optimum number, n, of families 
would be those shown in the last two columns of Table I. The analysis 
would point to the use of somewhere between five and eight inter- 
viewers for the Jagaddal study and to about six interviewers for the 
Nagpur study. It should be remembered, however, that the estimates 
8yr are based on four degrees of freedom for the Jagaddal study and 
only three degrees of freedom for the Nagpur study. These estimates 
are, therefore, subject to a high sampling variance. As a matter of fact, 
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the values reported for s,z are entirely consistent with a zero value for 
oyr. This situation points to the need for using more interviewers if we 
wish to estimate variances from the sample. 

If the cost per interviewer C,r had been taken as $4 instead of $80, 
the optimum number of interviewers for estimating monthly per capita 
expenditures in Jagaddal would have been 49. In this case the use of 
only five interviewers would mean an 80 per cent increase in the vari- 
ance of our estimate as compared with the optimum. 


TaBLeE I. Variance Estimates and Optimum Values of n and k 
for Certain Indian Surveys 








Inter- Optimum 
viewer number* of: 
contri- 


Characteristic bution . 
te eal Inter- Indi- 


pnee viewers | viduals 


(yr) (k) (n) 








Jagaddal— | 1. Monthly expenditure 
1942 in rupees per. capita 1.01 


2. Consumption of cere- 
als in pounds per head 
per month 13 100.8 800 


Nagpur— 3. Total monthly expen- 
1943 ditures .80 | 399.1 6 760 




















* Values giving minimum variance subject to the cost restriction that 2n +80k =C =2000. 


A small experiment similar to those of Mahalanobis was conducted 
in Baltimore by the Bureau of the Census as part of the December 1947 
monthly survey of the labor force. In this study, segments (small areas) 
were selected for interview in the Baltimore area. These segments had 
an expected size of six households. The households in 25 of the segments 
were divided into two sets of alternate households. Two enumerators 
were assigned to each of the 25 segments and given (at random) one of 
the sets of households for interview. Interviewers A and B shared six 
segments,® interviewers B and C shared five, interviewers A and C 
shared five and interviewers C and D shared nine. 

The situation in this study is approximated reasonably well by the 
specified mathematical model, if we assume that interviewers A, B, 
and C were drawn from one interviewer group and interviewers D and 





* To simplify calculations, one of these was eliminated at random. 
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E from another. The sample design is, of course, different but the dif- 
ference requires only minor modifications of the formulas presented 
above. 

To determine the optimum allocation of resources for the Baltimore 
study design, let n equal the number of segments and assume the follow- 
ing costs: 

C =Total budget = $400. 
C, = Cost per segment (using one interviewer to cover each segment) 
= $6. 

C,r1= Cost per interviewer = $7. 


Table II shows the values of s,? and s,z determined from the Balti- 
more study data, and the optimum values of n and k with the cost 
function (15). In the Baltimore study two interviewers were assigned 
to each segment. The estimate s,?, however, represents between-seg- 
ment variance in segment totals, and the estimate s,; represents (ap- 
proximately) between-interviewer variance in the average per segment. 
The optimum values n and k were determined for the case in which 
only one interviewer is assigned to any segment. 


Tas eE II. Variance Estimates and Optimum Values of n and k 
for the Baltimore Survey 








Number of persons per segment 





Employed | Operating 
Under Total | at nonfarm own 

14 years em- job for business 
of age ployed wages or or pro- 

salary fession 





Variance estimates 
8,? . . 1.51 
8yI . . 0 . 14 


Optimum values 
n 64 
k 2 
8," ‘ : a 0.024 


Variance of 9 with: 
n=64 andk= 2 F ‘ i 0.024 
n=62 andk= 4 ‘ : ‘ d 0.024 
n=57 andk= 8 ; , i 0.026 
n =48 and k=16 : ; . 0.031 
n=31 and k=31 i ; : 0.049 
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It will be noted that s,z is negative for three of the five characteris. 
tics. Negative values of syr frequently will be obtained when oy, is near 
zero (since syz is an unbiased estimate of oyr) and are particularly likely 
to occur when 8yz is based on a small number of degrees of freedom (i.e., 
relatively few interviewers) making the variance of s,7 relatively large. 
Where s,7 is negative, we have taken oy; as zero in estimating o,? and 
the optimum values of n and k. In these cases, of course, the optimum 
requires that k be as small as possible (i.e., k=2, the number of inter- 
viewer groups). 

For two characteristics (total persons and persons employed at a 
nonfarm job for wages and salaries) there is some contribution of inter- 
viewer error to the total variance. For these characteristics the opti- 
mum is fairly broad, i.e., for k between 4 and 16 the variance of 9 will 
be within 13 per cent of the optimum. 


USE OF THE SPECIFIED MATHEMATICAL MODEL IN 
MINIMIZING THE EFFECT OF BOTH 
BIAS AND VARIANCE 


The preceding section indicates a method for determining the opti- 
mum under fixed essential conditions. In many cases where it is evident 


that a particular survey technique is subject to substantial response 
bias, alternative techniques may be available that will reduce the bias. 
We must, of course, consider the relative cost of such alternatives. 


CHOOSING A SINGLE SAMPLING DESIGN 


We may have a choice of alternative methods, each with different 
essential conditions, response bias, and optimum values of n and k. For 
a fixed total cost we can determine the optimum values of n and k for 
each such method. Then the optimum method among those examined is 
the one which gives the lowest mean square error. For example, experi- 
ence in determining farm expenditures by direct questioning of farm 
operators has shown that the results are often subject to gross error. 
Determining farm expenditures by other techniques, such as detailed 
examination of purchase records, may be more accurate but is con- 
siderably more expensive. We can determine the optimum for direct 
questioning and for detailed examination of purchase records, subject 
to a fixed total budget, and select the method which gives the lower 
mean square error. The optimum methced for one budget level may be 
different from that for another budget level. 
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Uss or DovuBLE SAMPLING 


In some cases, instead of using a single method, a combination of 
two of the methods in a double sampling design may prove more effi- 
cient. For example, we could interview a relatively large number of 
eases (possibly the entire population) by one of the cheaper (and less 
accurate) methods and reinterview a subsample by one of the more 
expensive methods. Such a double sampling approach is likely to be 
useful in instances where methods with low response bias cost many 
times as much as methods with higher response bias. 

Suppose that our original sample is drawn as described in the previ- 
ous section and we have sampled n individuals and k interviewers (ka 
from the A-th group). For this sample, we obtain responses ya, under 
the essential conditions of the initial survey which we shall designate as 
essential conditions Y. For the subsample we take (at random) an 
equal number out of the individuals assigned to each interviewer, giving 
a subsample of n’ individuals. For the subsample we shall use a set of 
L' interviewer groups (which may or may not be the same as the 
original interviewer groups). We draw k’ interviewers (k4’ from the 
A-th group) in such manner that an equal number of interviews can be 
given to each interviewer. It will be noted that the interviewers for the 
subsample are drawn independently of those for the original sample 
and that k’ can be less than, equal to, or greater than k. For the sub- 
sample we have the responses ys. obtained by the original interviewers 
and we also have responses Z4s- obtained under essential conditions Z 
by the second set of interviewers. We use as an estimate’ of the true 
population mean X: 


ig! 
z= = (19) 
where 
9= Mean of yas. values for the entire sample of n individuals. 
9’= Mean of yas. values for the subsample of n’ individuals. 
z’= Mean of 24s. values for the subsample of n’ individuals. 


Actually, it might be more efficient from a sampling viewpoint to 
draw the k’ interviewers for the subsample of clusters as a subsample 
of the k interviewers used for the original sample of clusters. One of the 
main purposes of a double sample design, however, is to reduce the re- 
sponse bias, and frequently this will require the use of better qualified 
or better trained interviewers. Under these conditions, the second set 
of interviewers may be drawn from a different population of inter- 
viewers. 





? The estimate 2 is a ratio of random variables and is biased but consistent. 
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THE MEAN SQUARE ERROR WITH DovUBLE SAMPLING 


It is assumed that interviews under conditions Z are more expensive 
than under conditions Y and that method Z has a considerably smaller 
response bias. For the specified mathematical model, the mean square 
error of z will be approximately (see Appendix pp. 186-189): 


Ri +2: ‘= + =; » ~\ (20) 


Z = E(z') = E(2) 
Y = E(9) = E(5’) 
R,=Z-—X 

ZpysFyFs 329 Fy’? — yr 


on, mii (21) 





vo, — Os! 


Z2 
OsI 


= Fa 


pye= Correlation between the expected y and z values for the same 
individual. 


—4u (22) 


w (23) 


An Optimum DovuBLE SAMPLING DESIGN 


With a combination of two methods there is, in general, a set of 
optimum values for n, n’ and k’. As in the preceding section we shall 
consider only the case where the cost function is simple and the opti- 
mum values can be determined directly. For purposes of illustration we 
shall assume the cost function: 


C = Cyn + Cyrk + Cyn’ + Cirk’ (24) 
where 


C, = Cost per individual under conditions Y. 
C,r= Cost per interviewer under conditions Y. 
C,=Cost per individual under conditions Z. 
C.1=Cost per interviewer under conditions Z. 

C =Total survey budget. 
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Since M.S.E. of 2 does not involve k, to the order of approximation 
used in (20),* but the cost increases with k, the optimum design would 
call for making k as small as possible. Usually, the minimum number of 
interviewers will be determined by administrative considerations, i.e., 
an interviewer can be expected to complete a certain number of inter- 
views a day and, if the survey results must be available at some speci- 
fied time, we must give an interviewer no more than the number of 
cases he can complete within the time period allowed. If, then, we de- 
cide that an interviewer shall not do more than t, interviews, the small- 
est value we can give to k is k=n/t, and the optimum values of n, n’, 
and k’ are: 


U 
n=A ae (25) 
Cyr 
Cy +— 
t 
P v 
n’ = As/ — 
C, 


C 


Cy tae eoidiinis 
y/(c, + —*) + VY0C, + VCs 





(28) 





Using the optimum values in the formula for M.S.E. of 2, (20) will 
permit us to compare use of a combination of Methods Y and Z with 
use of either method alone or with other methods and combinations to 
determine the optimum design. 


EsTIMATION OF VARIANCES AND BIASES 


We must, of course, have some idea of the costs and of the values of 
u, v, w and R,. We can estimate Y and Z from a sample, using 7 as an 
estimate of Y and 2 as an estimate of Z. The variances can be estimated 
by (8) and (9), and an unbiased estimate of p,.c,o, is provided by: 





8 Formula 20 is an approximation which ignores moments of the third and higher degree. Where 
k, n, k’ or n’ is small some of these moments may be appreciable and the approximation to 03? may be 
poor. 
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n’ 
p YAboZa'o’c — 19'S" 
¢e 


r >= . 2 
ys8y8 ig (29) 





Estimation of the response bias (R,) is a more difficult problem since 
this error involves the unknown true population mean X. However, 
a satisfactory comparison of several methods can sometimes be made if 
one is justified in assuming a negligible response bias for the method 
which is considered most accurate and in estimating (from previous 
experience or a pilot sample study) the differences in expected value 
between this most accurate method and the other methods used. 

For example, if Method Z is one which is subject to negligible re- 
sponse bias, we can use as an estimate of bias for some other method 
either 


F=aj—zi- 


These estimates are, of course, subject to sampling error. Formulas for 


the variances of 7,’ and 7, are given in the Appendix (equations 112 
and 116). 


ILLUSTRATION OF JOINTLY MINIMIZING VARIANCE AND BIAs 


To illustrate the technique for determining the method which mini- 
mizes the mean square error, we shall use a problem which involves 
estimating the average dollar inventory of a group of retail stores. Let 
us assume the population consists of all retail stores in a large city and 
that our budget for the survey is $15,000, of which $2500 has been 
set aside for fixed overhead. Then C = 12,500. We shall also assume that 
the maximum assignment to an interviewer (t,) is as shown in Table 
III. Suppose that pilot studies and previous experience give cost, vari- 
ance, correlation, and response bias estimates for five different tech- 
niques, and that we wish to determine which technique or combination 
of two techniques to use. Let us assume that the estimates of unit costs, 
response bias, and variances for each technique are as shown in Table 
III, and that the correlations, p,., for each pair of techniques are as 
shown in Table IV. We shall take X = $100,000. 
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TaBLeE III. Cost Factors, Maximum Assignments, Biases and Variances 
for a Study of Retail Store Inventories 
(Hypothetical) 
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Unit costs Square root 


Method 


in dollars 





Per Per 


inter- 


Maximum 
assignment 
per inter- 
viewer 


Response 
bias in 
thousands 
of dollars 


of variances 


and covariances 


in thousands 
of dollars 


store 





viewer a4 
Cr Cy ty Ry Cy 


vo vi 





25 25 100 —11. 83 25 

50 60 - 6. 80 15 
100 40 —- 2. 76 10 
150 35 ; 73 9 
150 35 ‘ 71 6 























TaBLeE IV. Correlation, py:, between Expected Values 
of Individual Responses 
(Hypothetical) 








Method 




















0.95 





Table V shows the mean square error which would be obtained for 
each method and each combination of methods, using with the single 
sampling method the values of n and k given by (16) and (17), and with 
the double sampling method the values of n, n’, and k’ given by (25), 
(26), and (27) with k=n/ty. 

The optimum, if only a single method is used, is Method 4 with 
n=724 and k=25. However, double sampling permits a further reduc- 
tion of 35 per cent in the mean square error by using Methods 1 and 5 
with n=3480; k=35; n’=382 and k’=21. In most cases, double 
sampling will not give gains of this magnitude over a good single sam- 
pling method. It should be noted that the figures used in Tables IV and 
V are hypothetical and are used only to illustrate the methods. 
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Tas.e V. Optimum Values of n, k, n’, k’, for the Hypothetical 
Example of Table III and IV. 


——a ————— 








Minimum 
mean square 
error 
for the 
indicated 
method 


Methods 





12 ,022 123.2 
3,197 39.8 
1,350 12. 

723 11. 
508 12. 

1 and 2 5 ,242 39. 

1 and 3 4,382 

1 and 4 3,582 

1 and 5 3,480 

2 and 3 1,521 

2 and 4 1,340 

2 and 5 1,329 

3 and 4 703 

3 and 5 718 

4 and 5 481 




















In situations of this type it may frequently be necessary to increase 
expenditures many times in order to reduce the response bias from 10 
per cent to 2 per cent. For example, one company which compiles data 
on sales of individual commodities by retail stores, has found that only 
by personally checking physical inventory and purchase invoices can 
they obtain accurate reports on sales, but they feel that it pays in 
increased accuracy of response to spend considerable time making the 
necessary checks. As another example, the Bureau of the Census re- 
interviewed a sample of respondents, using in the re-interview profes- 
sional personnel from the Washington office. The average cost per 
re-interview was approximately 7 times the average cost for the origi- 
nal interview, and there was a significant increase in the accuracy of 
certain items, such as coverage of persons. On the other hand, in some 
cases increases in expenditure may yield only small gains in accuracy 
or large gains for some items and small gains for others. In the same 
study of the Bureau of the Census, the per cent distribution into 10 
year age groups, for example, was practically identical for both inter- 
views, with none of the 10 year age groups differing by more than } of 
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1 per cent from the original interview to the re-interview. If the primary 
aim of the survey were to obtain an accurate per cent distribution by 
age, use of the more expensive method would not be justified. 

It is also possible that very small increases in expenditures may 
produce large differences in accuracy. The Bureau of the Census in its 
Monthly Report on the Labor Force, for example, had been getting a 
large number of persons erroneously reported as not in the labor force. 
A revision of the questions asked added nearly 2 million of these 
“missed” persons to the labor force [1]. The revision added practically 
nothing to the cost of the survey. 

Thus, there is no “typical” relation between cost and accuracy. Each 
survey presents its own picture. The method outlined here is general 
in its applicability, although the answer obtained will vary. 

It should be noted that the work of determining the “optimum” de- 
sign can frequently be shortened by eliminating from consideration 
alternatives which are obviously inefficient. For example, Method 5 in 
the illustration above involves a cost per store two-thirds greater than 
that of Method 4, but the response bias for the two methods differs by 
only a trivial amount. The higher expenditure per unit in Method 5 
improves the values for individual stores, but the individual response 
errors of Method 4 are largely “compensating” in nature. Although the 
combination of Methods 1 and 5 gives the lowest mean square error, 
the result does not differ appreciably from that for Methods 1 and 4. 
Thus, consideration of both Method 4 and Method 5 was.-really un- 
necessary in selecting an optimum. As will be indicated below, increas- 
ing unit expenditures to reduce individual response variance or inter- 
viewer variance is usually undesirable when the response errors are 
compensating and the increased expenditure does not affect the re- 
sponse bias. 


EFFECT OF UNCORRELATED AND COMPENSATING 
RESPONSE ERRORS 

A consideration of the specified mathematical model leads to the 
conclusion that response errors that are uncorrelated with each other 
and compensating in character do not necessarily need any special 
attention in survey design, if the purpose is to estimate a mean or total 
for the total population. In this case, furthermore, the conventional 
formulas for estimating sampling error from a sample reflect the re- 
sponse error properly, and no special attention need be given to the 
presence of response errors. This situation is, however, often assumed 
without valid evidence. It is not uncommon for the results of a survey 
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to be justified on the basis that “some of the errors were positive and 
some were negative, so the net effect is undoubtedly zero.” Obviously, 
it is not possible to assume that, because positive and negative 
errors are both present, the net error is necessarily zero. Moreover, as 
we have already seen, if the response errors are correlated with each 
other, as within the work of a single interviewer, the variance is in- 
creased and the chances of errors being compensating are reduced. Let 
us consider, however, the situation where there is evidence that the 
errors are, in fact, compensating and uncorrelated. 

The essential points can be seen easily in the very simple situation in 
which the response errors are uncorrelated for any two individuals in 
the population and a random sample of individuals is drawn without 
restriction. The variance of a sample estimate of the true population 
mean under these conditions is: 


os? = o,?/n. (30) 
Since y:;=2:+1r;; (where r;; is the response error) we can express o,? as: 
Oy? = O27 + Ory? + QperyTsOr, (31) 

where 


o;?=variance of the true values. 

o,,?= variance of the response errors, which is composed of the vari- 
ance of response errors for an individual around the indi- 
vidual’s expected response error and the variance in the ex- 
pected value of response errors between individuals. 

Pzry= correlation between x; (the true value for the 7-th individual) 
and #; the expected response error for the 7-th individual. 


RESPONSE ERRORS ARE REFLECTED IN USUAL SAMPLE VARIANCE 


We see from (31) that the variance of y reflects any effects of the 
response errors as well as the variance of the true values. Similarly, if 
we estimate o,? by 


¥ (vs — 9)? 


(n — 1) 





8? = 


the effect of the response errors will appear in the estimated variance, 
since the expected value of s,? is equal to o,?. Consequently the esti- 
mated variance of the sample mean will include appropriate allowance 
for the response variation. 
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REDUCING THE EFFECT OF RESPONSE VARIATION 


If we assume a fixed total budget, the number of cases which can be 
sampled is equal to C, the total budget (after deducting fixed overhead) 
divided by the unit cost C,. Thus, we have: 


ast = Sy (32) 
C 


Let us compare the results of Method Y (which gives the response 
y;) with those of Method Z (which gives the response 2;) assuming that 
Method Z has a unit cost C,>C,, that o,,<o,,, and that R,=R,=0. 
Then, Method Z is preferable to Method Y only if: 


or < a5" (33) 
or if: 


ef C, 
<—-: (34) 
sf C, 

Inequality (34) provides a test of the relative efficiency of two data- 
collecting techniques which have either no response bias or the same 
response bias. Let us consider a hypothetical example. Suppose, we 
have a choice between two methods of estimating a characteristic, both 
methods using an unrestricted random sample of families and: 


For Method Y For Method Z 
Cy = $2 C, = $5 


oy, = 10 or, = 2 
Por, = 0.1 = 0.1 
o, = 10 


Since both methods have the same response bias, Method Z will be 
more efficient than Method Y only if (34) holds. The figures above give 
C,/C,=0.4 and o,2/0,?= 108/220 =0.49. Thus (34) does not hold, so we 
gain more by putting our funds into a larger sample permitted by the 
lower unit cost of Method Y than by putting them into the (substan- 
tial) reduction of response error permitted by Method Z. 

Ordinarily, any appreciable increase in expenditure to decrease re- 
sponse variation would be unwarranted if there is no effect on the re- 
sponse bias. 

If we are estimating the proportion of the population having a given 
characteristic, any expenditure to reduce response errors would be 





176 AMERICAN STATISTICAL ASSOCIATION JOURNAL, JUNE 1951 


wasted whenever the response errors are compensating so that Z=Y, 
since, in this case, o27=<¢;?. 

The difficulty in sample design is, however, that there is rarely any 
assurance that response errors are, in fact, compensating. Usually 
there will be a net response bias and we cannot concentrate on sampling 
variance and assume that response errors are of no importance. Even 
if we were to reduce sampling error to negligible proportions by taking 
a complete census, we may still have a substantial mean square error 
because of response bias. 

There has in recent years been considerable success in the control 
of sampling errors. Attention has also been devoted to response varia- 
tion, e.g., the work of Mahalanobis [4] and work in psychology on 
“reliability.” Most textbooks in psychological or educational statistics 
or in tests and measurements discuss reliability at considerable length. 
In the psychological testing field, attention has also been given to prob- 
lems of response bias (“validity”). In the survey field, there has been 
very little investigation of the problem of response bias. 


APPLICABILITY OF THE SPECIFIED 
MATHEMATICAL MODEL 


The analysis presented in this paper applies, of course, only when 
the conditions of the specified mathematical model obtain. It is im- 
portant, therefore, that we examine these conditions in terms of the 
situations actually prevalent in typical surveys. With regard to the 
selection of individuals or other sampling units for interview, it is quite 
possible to use techniques, e.g., selection dependent upon a table of 
random numbers, which give random samples. A determination that 
individual responses behave like random variables, however, will re- 
quire much more experimental evidence than is now available, and will 
necessarily be subject to some question. 

As noted previously, the conditions which determine the response 
of any individual may be regarded as divided into two groups: 

(a) Those conditions which are “constant,” “controlled” and pre- 
determined for a given individual response, e.g., the questions to be 
asked, the type of interviewer, etc. We have referred to these as the 
essential survey conditions. 

(b) Those conditions which are adventitious and “unpredictable,” 
e.g., the mood of the respondent, a momentary distraction which re- 
sults in a question being misunderstood, etc. 

This division is similar to the division between “assignable (i.e., con- 
trollable) causes” and “residual” causes of variation in discussions of 
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quality control. We have treated these two groups of factors in the 
way they are treated in the quality control field. Thus, we consider 
the “adventitious” factors as giving rise to a random variate, the re- 
sponse obtained for a given individual being one of the values of this 
variate. The “controlled” causes would determine the expected value of 
this random variate. They- also affect its variance. 

It should be noted that the present analysis does not provide for 
measuring the response variance of a single individual apart from the 
variance between individuals. Thus, a direct test of the random nature 
of individual response variation cannot be made by using the specified 
mathematical model. We can, however, determine the total variance 
and, by applying the specified mathematical model in a large number 
of cases, can test its approximation to actual conditions. 

Another feature of the specified mathematical model is the treatment 
of the effect of interviewers on survey accuracy in terms of a sample 
design in which the interviewers are a random selection from a “group” 
or pool of interviewers potentially available in a certain area. The 
analysis assumes that any given individual could be assigned only to 
one particular pool of interviewers, and that the interviewer is a mem- 
ber of only one pool. Actually, this condition exists only approximately. 
Two interviewers available for (and used in) a survey frequently have 
overlapping “areas of availability” but, in many cases, the overlap is 
not complete. In making assignments, a survey supervisor will have 
certain cases which he might assign to any one of several available 
interviewers and other cases which he would assign only to a certain 
interviewer. * 

Another difficulty comes from the assumption that the interviewers 
are drawn at random from the pool, that the available sample indi- 
viduals are assigned to them at random, and that the selection of inter- 
viewers is independent of the selection of individuals. Even where all 
the sample cases drawn in an area could be interviewed by any of the 
available interviewers, the survey supervisor will usually arrange an 
interviewer’s assignment to minimize travel costs, not at random. 
While the interviewers he selects may possibly be considered a random 
sample from some pool, the sampling of interviewers is not necessarily 
independent of the selection of individuals to be interviewed. 

In spite of these difficulties, the condition that interviewers be drawn 
at random and be given random assignments within the limits of avail- 
ability may approximate actual conditions reasonably well. We can, in 
fact, define our interviewer groups for any survey so that the condition 
holds. For example, we can think of the assignments which a given 
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interviewer would get for each of the possible sets of n individuals and 
define the A-th group of interviewers to include only interviewers who 
would get the same assignment as our given interviewer over every 
possible sample. This may result in placing only one interviewer in a 
given interviewer group. Although we cannot make an exact estimate 
of variance between interviewers under these conditions, an approxi- 
mation can be made by determining the variance between paired inter- 
viewer groups. The result would usually be an overestimate of the 
variance. 

In the application of statistical analysis to response errors there is a 
great need for experimental evidence to determine whether the mathe- 
matical model described earlier serves adequately to approximate the 
conditions actually obtaining in a given survey. Experience along these 
lines will most likely lead to important modifications in the analysis. 
The implications of this entire problem need thorough exploration and 
the analysis presented in the present paper can be considered only as a 
step toward a systematic treatment of response error. 


APPENDIX 
VARIANCE OF A SAMPLE MEAN 


We have a population divided into N units. The unit may be a 
population element or it may be a “cluster” of elements (e.g., a house- 
hold or a group of households living in an area, etc.). The N units are 
divided into L groups with Nx in the A-th group. There are K, inter- 
viewers available to interview the units in the A-th group (and only 
these units). On any particular interview of a “unit” by an interviewer 
a response occurs. Suppose that this response is a random variate for 
any interviewer and any respondent and let: 


Pascp=probability that the response Y,zcp will be obtained if the 
B-th interviewer in the A-th group interviews the C-th unit 
and 


Pascpvuvw=probability that responses Yagcp and Yavyw are ob- 
tained (in A-th group) if the B-th and U-th interviewers 
interview the C-th and V-th units. 


We have then: 
> Pazscp = 1 


D 


EascY asco = Vasc - > PagcoY asco 
D 
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where the sum is taken over all possible responses of C-th unit to B-th 
interviewer. (The notation Er will be used to indicate the expected 
value for a fixed R.) 


> Dd Pascovvw = D2) Pasco = Dd, Pavvw = 1 (36) 
DW D W 


Eascuv(YascpYauvvw) = >. > PascouvwY ascoY avvw (37) 


D WwW 


where the sum is taken over all possible responses of C-th unit to B-th 
interviewer and of V-th unit to U-th interviewer. 
We shall assume that, if BAU and C¥V: 


Pascouvw = (Pascp)(Pavuvw) (38) 
so that: 


Eascuv(YascpY auvw) = VascY avv- (39) 


Let us suppose that the sample design calls for drawing units from 
the population as a whole and drawing interviewers independently 
within each group. These selections are made at random without re- 
placement. To each sample interviewer drawn from the A-th group we 
assign, at random, a certain number of those sample units drawn which 
come from the A-th group. We fix in advance: 


1) n=the total number of sample units to be drawn. 
2) i,4=the number of sample units to be assigned to each sample 
interviewer from the A-th group. 


Since the n units are drawn independently of the groups, the number 
of units falling in the sample in any group is a random variable. Let us 
designate the number of units in the sample from any group by ma. 
Since we fix the number of sample units per interviewer in the A-th 
group at #4, the number of sample interviewers to be drawn in this 
group will be k4=n4/na and ky, will also be a random variable. 

Let 

Yabe=Tesponse obtained for c-th sample unit when interviewed by 

b-th sample interviewer drawn from the A-th (population) 
group. 

L kA A 

he p Y Abc 

A b 


c 





4 = 
n 
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Contingent upon drawing a unit from the A-th group, the probability 
of yate= Yasep is (1/Na)(1/Ka) Pasco for all values of 6 and ¢ and, 
for fixed A, there are na values of yas-. Thus: 


L 
Dd Enayare 
A 





Ey 
nr 
Ka Na 


>» } >> PascoY asco 
BoC D 


En = EnsY 
A KN, AVa 





Ka Na Ka Na 
D dX DX PasevY asco > Dd Vase 
B Cc D B Cc 





KiNa 7 KiNa 


therefore 





L L 
> Ena? + Do Enanrpa5r 
A 


A#T 





n? 
Assuming N to be large: 
Enanrja9r = (E957) (Enanr) 


ae — 1)NsaN 
= Ya Yr [= = - *] 
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L 
DL Ena*5a? 
A 





n? 


n(n — »( DNAYs ) — n(n — ) EvePe 
A A 


+ 
n?N?2 


— y?2 





L L 
Leni 5, (n- 1) DNTP? 





n? n nN? 





Ena*5.2 = E »> Abe 
mine = BCS Eva) 
= (Ena) (Eyav.?) + (4 — 1)(Ena)(Eyarcy are) 
+ [Eka(ka — 1)H4?)[Eysseyaur]. (51) 


Elka(ka — 1) 4? | = E(n,4? = hana) 


om n(n — 1)N4? (ia — 1)nNu 
N? N 





Ka NA 
D Dd (BascovY sscpY savvw) 


BeU CyV 


Ka(Ka — 1)Na(Na — 1) 





E YAbYAuv = 


Ka Na 
> YascY avv 


BeU CU 


~ Ka(Ka—1)Na(Na—-1)- 





For large K, and N,: 
EY avcY Aur = Y,? ' 
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Thus: 


Ena’¥2? = = (Eyave2 — Ya?) + cd al 
N N 


(ti4 — 1)nN 
+ = N = (EyavcYabo — Ya?) 


n(n — 1)Na?V 4? 
+ v2 
ys nN soywa® (tia — 1)nN aoyra 
N N 
nNaY x2 n(n = INAV 4? 
-  — 

















Ka NA _ 
>> >> Easc(Yascp — Ya)? 
i B C 
oywa? = Eyas.? — Ya? = EN. 





OyIA = Ey avcY abv - Ya’ 
Ka NA 


> Dd Easev(Yasco — Ya)(Yasvw — Ya) 


B CV 





KaNa(Na — 1) 


Therefore: 


L L 
> Nacywa? =D (ta — 1) Naoyra 
A A 


os 
nN nN 





L 
> Na(Va — Y)? 





- nN 
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where 


L L 
> Nacywa? > Na(¥a — Y)? 
A A 


N al N 





L 
Zz. Na(Eyav? ital Y?) 
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If we let: 
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If i4=n=Nn/k: 
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VARIANCE OF A MEAN WITH A DouBLE SAMPLING DESIGN 


The text considers a case where n individuals are drawn at random 
and from these n individuals, % individuals are assigned at random to 
each of k sample interviewers (k4 interviewers from the A-th group). 
For this sample we obtain responses y4s. under essential conditions Y. 
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From the # sample cases assigned to each interviewer we subsample at 
random #’ cases, giving a total subsample of n’=kv’. We also have a 
set of L’ interviewer groups (which may or may not be the same as the 
original interviewer groups). Of the subsample cases np’ are available 
for interview by interviewers in the F-th of the L’ interviewer groups. 
We draw kr’ (=k’nr’/n’ where k’ is determined in advance) inter- 
viewers from the F-th interviewer group and assign at random to each 
of these sample interviewers #'’(=n’/k’) individuals. The second set of 
interviewers obtain responses zr, under essential conditions Z from 
each of the n’ individuals in the subsample. We use as an estimate of 
the true population mean X: 


(84) 
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The expected value and variance of 7 have already been derived. 

The expected values and variances of 7’ and 2’ are identical with the 

values for a random sample of n’ cases drawn without reference to the 
sample of n cases from which 7 is calculated. Thus: 
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where 


Vasc =expected response value for C-th individual in the population 
interviewed by B-th interviewer in A-th group under essen- 
tial conditions Y 
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where 


Zrac= expected response value for C-th individual in the population 
interviewed by G-th interviewer in F-th group under essential 
conditions Z. 

Using the usual Taylor series approximations to the expected value 

and variance of 2 gives: 
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It will be noted that the bias of 7 is approximately Z— X. Thus, if Z 
is closer to X than is Y, Z may be a better estimate of X than 4, even 
though the variance of ~ exceeds the variance of 9. 

As previously noted, the variances of 7’ and 2’ are the same as vari- 


ances when the subsample of n’ is drawn independently of the sample 
of n. Thus: 
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Unbiased estimates from the sample of o,? and oyz have been derived 
above. Estimates of o,? and o,; will have the same form as those for 


o,? and o,r using the values zr,,. As an unbiased estimate of oy, we 
have: 
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VARIANCE OF ESTIMATES OF RESPONSE Bias 


In the text we have considered the case where samples are drawn as 
described in the preceding part of this Appendix (double sampling de- 
sign) and it can be assumed that Z+X. In this situation estimates of 
the bias of # as an estimate of X are: 


=7—32' (109) 
and 
=f-—Z- (110) 


7,’ is an unbiased estimate of the response bias R,(= Y—Z). The 
“ratio” estimate 7, is a consistent estimate of R, and the bias in7, will 
ordinarily be small. For the variance of 7,’ we have: 


or,? = a9? + oy? — Zogy - (111) 


Substituting values obtained above for o,”, o,? and cy, gives: 
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A CRITICAL ANALYSIS OF FARM EMPLOYMENT 
ESTIMATES* 


D. GaLE JOHNSON AND MARILYN CorN NOTTENBURG 
University of Chicago 


NALYSES of resource use in agriculture, or of the income of farm 
families and workers, frequently require the use of estimates of 
agricultural employment. The measurement of agricultural employ- 
ment is difficult; on this point there is no question. The difficulty is 
related to several aspects of the farm labor force and its employment 
conditions—the large number of unpaid workers; the flexibility of 
farm work requirements which permits an individual to participate in 
work on more than one farm or in farm and non-farm work simul- 
taneously, and the difficulty of segregating household work from farm 
work. In addition, there are more farm firms than all other firms in 
the economy. To these problems must be added that of defining what 
actually constitutes a farm. 


1. COMPARISON OF FARM EMPLOYMENT ESTIMATES 


There are two continuing series of estimates of agricultural employ- 
ment. One is provided by the Bureau of Agricultural Economics and 
extends back to 1910. The other is maintained by the Bureau of the 
Census and starts in 1940. Though both series are given similar labels— 
the BAE series is called farm employment and the Census (or MRLF 
for Monthly Report on the Labor Force) estimates relate to persons 
employed in agriculture—they measure quite different things. This is 
apparent from the methods of data collection and the definitions of a 
farm worker, as well as from the actual estimates provided. 

Table 1 compares these annual estimates of employment since 
1940, while Table 2 gives monthly data since 1945. Three things 
stand out in Table 1. First, there is a large difference between the two 
estimates, a difference of roughly a quarter for the forties. Second, the 
difference in the estimates has been increasing, particularly since 1945. 
While the BAE estimates indicate an increase in employment between 
1945 and 1948, the Census does not reveal any post-war rise. Third, 
the estimates of family employment and hired employment contribute 
in about equal proportions (relative to their importance) to the differ- 
ence in the estimates of total employment. During the decade, the 
sources of the difference varied in importance, with the family employ- 
ment estimates exhibiting relatively large differences in the first part 





* The research on which this article is based has been done ona project financed by a grant for Agri- 
cultural Economics Research at the University of Chicago made by the Rockefeller Foundation. 
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of the decade, and the hired employment estimates varying most 
sharply in the last part. 

The data in Table 2 indicate a sharp seasonal pattern in the differ- 
ences. In December, the two estimates are usually very close together 
and the same is true of January, though to a slightly less degree. The 
largest differences are in September in each year, reaching a maximum 
difference of 6.5 million in 1949 (about 80 per cent of the Census 
estimate). While the series agree on the months of minimum employ- 
ment (December, January and February), the Census places June as 
the peak month while the BAE finds the most people employed in 
September with October running a poor second. The Census gives 
July a close second position. 


TABLE 1 


ANNUAL ESTIMATES OF AGRICULTURAL EMPLOYMENT 
(000 omitted) 








BAE* Censust Difference 





Total Employment 
11,671 
11,4:9 
11,458 
11,329 
11,055 
10,813 
11,092 
11,166 
11,080 
10,756 


= 
=) 


2,131 
2,319 
2,208 
2,249 
2,105 
2,233 
2,772 
2,900 
3,107 
2,730 


88 


PAPER ROOL© 
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Family Employment 
8,866 
8,652 
8,689 
8,704 
8,643 
8,548 
8,766 
8,759 
8,595 
8,326 


- 


1,806 
1,782 
2,029 
1,874 
1,693 
1,728 
2,116 
2,170 
2,368 
2,145 


- 


eSeessessss 


- 


~ 


AAABQAAABAAN 


Hired Worker Employment 


2,480 325 
2,230 537 
2,590 179 
2,250 375 
2,000 

1,760 505 
1,670 656 
1,677 730 
1,746 

1,845 585 





* BAE, “‘Farm Labor,” February 10, 1950, p. 9. 
+ Bureau of the Census, Current Population Reports, Labor Force, Series P-50, No. 2 and No. 13, 
and Series P-57, current issues. 
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TABLE 2 


MONTHLY ESTIMATES OF FARM EMPLOYMENT BY BUREAU OF THE CENSUS AND 
THE BUREAU OF AGRICULTURAL ECONOMICS JANUARY, 1945 
THROUGH DECEMBER, 1949 


(in millions) 
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of Agricultural en Date of Agricultural 
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ence 





June, 1947 1 
M 

A 

M 

F 

Jan. 1947 


eOnnnrovcoowmoann oe 
eorwonounn co @ 
Ph ROTCHHANOSRLD 
Or NWONNDHKHAAH OS 
Qror cr QOoan oe 
eann oo 
aoconvooc 
Noor 
aooaovowrn 
— me bo OO OO to 
QOnn » » oO 


Orn OP RH WRORK O 


NDHOWSOANwWAaALD 
Se ot 


Qeanrememoocoonn 
Noaoanoooremocow 
NMWOrNNNNRWON 
QOoror ant Per DOOQHK OD 
Or NWWNNWaA RK O 
ONAnQOnrt Pe NOAA 


7. 
8. 
8. 
8. 
8. 
9. 
9 

7. 
7. 
6. 
6. 
. 


~OOhRSOHRNHRNRQOM 
DreoeanwraKeoran 


Jan. 1948 


Dec. 1947 
N 
18) 
8 
A 
July, 1947 


Nunoowovooovoeonoaon 
Orr NON OK © WO KF W 
— et ot ot 
NIOOFNNNN KWON 
Ornoowr Ono Ww 
Orr NWNHNWU RK OO 
eoowor Nn KP NN = 


oeonmmoaon 
Ke ON OOS 
— te 

Nnwoawon 
NOrnwna 
nwo rrK © 
oor Qn or 


Jan. 1945 


= 





* For source, see Table 1. 
+t A negative sign means Census estimate exceeds BAE. 


2. SOURCES OF DIFFERENCES IN ESTIMATES 


Why are there these differences in the estimates? Apparently no one 
really knows in any detailed way; though some of the more important 
reasons have been stated, no precise estimate of the significance of 
each has been made.! There are three main sources of differences be- 
tween the estimates, sampling errors aside. First, the BAE obtains its 
data from establishments or farms; the Census from households. Thus 





1 See Farm Labor, February 10, 1950, pp. 5-6; T.C.M. Robinson and Paul P. Wallrabenstein, “Es- 
timates of Agricultural Employment and Wage Rates,” Journal of Farm Economics, XXXI (May, 
1949) pp. 233-47) ; Louis J. Ducoff and Gertrude Bancroft, “Experiment in the Measurement of Unpaid 
Family Labor in Agriculture,” Journal of American Statistical Association, XL (1945) pp. 205-13. 
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if an individual works on more than one farm during a survey week, 
he will be counted more than once by the BAE but only once by the 
Census. The BAE has indicated that the “additional count of workers 
attributable to this duplication is estimated at a minimum of a quarter 
of a million and may be considerably larger.”? 

Second, the BAE has no age restriction on labor force participation 
while the Census includes only individuals 14 years or older. The 
BAE estimates that this difference may amount to as much as two 
million during the summer peak though its average importance 
throughout the year was not estimated.’ 

Third, the BAE includes any individual in the farm employment 
category who meets the minimum work requirements regardless of the 
amount of work done elsewhere. The minimum work requirements are 
perfectly nominal for farm operators and hired workers—any work at 
all for farm operators and one hour per week for hired laborers. For 
unpaid family workers the minimum is 15 hours per week. The Census 
classifies an individual employed in agriculture only if he does not do 
more work outside of agriculture than in agriculture and if the indi- 
vidual meets the minimum work requirements. The significance of this 
difference “may range from one-half million to a million in different 
seasons of the year,”* according to the BAE. 

The Bureau of the Census has made two surveys of multiple em- 
ployment—in July, 1946 and January, 1943. In July, 1946, it found 
760,000 persons with two or more jobs, at least one being in agricul- 
ture. In January, 1943, the same number was found, though a larger 
proportion of the multiple jobs involved nonagricultural work in Janu- 
ary, 1943, than in July, 1946.5 However, about 20 per cent of the 
workers were temporarily absent from their additional jobs during the 
entire survey week, indicating that the establishment reports may have 
overestimated employment by about 650,000 for these two dates. 

Others differences exist. The Census Bureau includes persons holding 
“nonfarm” types of jobs, some persons in agricultural processing ac- 
tivities, and other individuals who had a job but were not at work dur- 
ing the survey week because of illness, weather conditions, etc. In 1945 
the latter group included 220,000 workers on the average each month. 
In 1948 the figure was 287,000.° In 1940 the Census of Population in- 
dicated 140,000 individuals held jobs of nonfarm type or with agri- 





2 Farm Labor, February 10, 1950, p. 6. 

3 Ibid., p. 6. 

4 Ibid. The minimum work requirements for unpaid family labor is the same for MRLF and BAE. 
5 Bureau of the Census, Multiple Employment, July, 1946, Series P-S, No. 21, pp. 2 and 6 

6 Bureau of the Census, Series P-50, No. 2, p. 26; No. 13, p. 11. 
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cultural processing firms and were counted as employed in agriculture.’ 

The Census fails to include important segments of two general 
categories of workers in its sample—imported foreign workers and 
migratory workers not living in households. For 1945 the BAE es- 
timated that the first group included 120,000 different persons and 
the latter about 200,000. It was estimated that the first group worked 
167 days a year, while the latter group averaged 125 days.* Since 
the end of the war, the number of imported foreign workers has un- 
doubtedly declined. 

The BAE has recently revised its estimates of farm employment for 
the years 1931 to date. In fact, two revisions have been made, one 
published in 1949 and the other in 1950. These revisions were largely 
the result of a change in definition plus the availability of new data 
from enumerative surveys.°® 

Admittedly, the old series had certain inadequacies. As we under- 
stand it, the real bench marks used were the 1930 Census of Population 
estimates of employed gainful workers and similar data from the 1910 
and 1920 censuses. The 1930 estimates assumed that every farm had 
an operator worker, but those farms that had operators who worked 
off farms more than 250 days were deducted from the total number of 
farms. This excluded all labor working on such farms from inclusion 
in the estimates.’ Thus in effect it can be said that the estimates, at 
least until the late thirties, were estimates of labor force participation 
(household basis), rather than estimates derived from establishment 
reports. Data from establishments (farms) were used only for inter- 
census periods. . 

The recent revisions of the BAE series now derive their levels as 
well as year to year changes from establishment reports, at least for 
the years since 1940. Between 1931 and 1940 some admixture of the 
two methods of approach seems apparent. The definition of a farm 
worker that was implied from 1935 through perhaps 1948 required 
that an individual work at least two days a week on the reporting 
farm. In changing to the new definition, workers in the BAE wrote: 
“The new definitions correspond much more closely than did the old 
ones to those currently used by the Census in its Monthly Report on 





7 Sixteenth Census, Population, Occupational Characteristics, pp. 208-09, 232-33. 

8 L. J. Ducoff and M. J. Hagood, “Employment and Wages of the Hired Farm Working Force in 
1945,” USDA, BAE, 1946, p. 39. 

® The data used as the basis of the revisions are presented in detail in a BAE series entitled, Sur- 
veys of Wages and Wage Rates in Agriculture, Reports numbered 4, 7, 16, 20, 21, and 22 contain most of 
the applicable employment data. 

10 Edon E. Shaw and John A. Hopkins, Trends in Employment in Agriculture, 1909-386, WPA, Na- 
tional Research Project, Report No. A-8 (November, 1948) Appendices A, B, and F. 
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the Labor Force.”" This strikes us as being fundamentally incorrect, 
Presumably similarity of two definitions can be defined only in terms 
of how closely they describe or de}imit a common attribute, which in 
this case, is the level of farm employment. On this criterion, the old 
definition used by the BAE was more nearly the same as that used by 
the Census.” 

In February, 1950, the revised estimates published in January, 
1949, were revised again on the basis of data collected in 1948 but not 
available for the first revision. The latest revision again increased the 
level of estimated employment. The first revision added about 600,000 
to 750,000 to the original estimates of farm employment for 1944 
through 1948. The second revision resulted in further increases of 
300,000 to 400,000 for the same years. Thus the latest revised es- 
timates are about 1,000,000 higher than estimates made prior to 1948 
for the decade of the forties. 

Even after account is made of the rough estimates of differences 
enumerated above, the remaining difference was of the order of 1,500,- 
000 in 1948 in farm employment as reported by the Census and the 
BAE. This discrepancy is about 15 per cent and is large enough to 
cause concern among research workers and others utilizing data on farm 
employment. In fact, the difficulties are underemphasized by making 
the adjustments. Ordinarily, as one would expect, the BAE series is 
taken as published and so used. Two examples of its use by workers in 
the BAE may be indicated as illustrations. In a statistical compilation 
called “Net Farm Income and Parity Report,” which is now published 
annually, a comparison is made between the average income from 
agriculture and government payments per worker and the average 
wage per industrial worker. For 1948, using employment data pub- 
lished before the recent revisions, it was estimated that the average 
income in agriculture per worker was $1,963 compared to $2,707 for 
the wage income of industrial workers. Using the most recent employ- 





ul Tbid., p. 244. 

12 The wording of the new definition does correspond fairly closely to the one used by the Census 
with two important exceptions. The Census counts only persons 14 years of age or older while the 
BAE counts all workers regardless of age. The second exception and perhaps the more important one 
is that the Census requires that a worker both meet the minimum requirements and spend the major 
portion of his working time in agriculture in order to be counted as a farm worker. 

In an industry such as agriculture where a worker may spend only a part of his working time on 4 
particular farm, virtually any definition of farm workers used by an establishment type survey will 
result in a different level of farm employment than that obtained by a household survey using the same 
or a different definition. However, by judicious choice of definitions, it is possible to obtain fairly 
similar results. The old BAE definition obtained a level of farm employment which was closer to the 
Census data than that obtained by the new definition since the criterion of two days’ employment prob- 
ably excluded a considerable number of operators doing farm work incidentally as well as some hired 
workers working on more than one farm. 
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ment estimates made by the BAE gives a figure of $1,787." If the 
Census estimate of employment is used, a figure of about $2,475 re- 
sults. This hiatus is certainly too large. 

The BAE has constructed a series of indexes of output and gross 
farm production per worker. Based on the old BAE estimates of farm 
employment, gross farm production per worker increased from 97 to 
111 between 1920 and 1940. Using the new estimates of labor employ- 
ment we get a change from 97 to about 101. This implies that produc- 
tion techniques were essentially static during this twenty year period 
and that the improvements in seeds, yields per acre, and increased pur- 
chases of inputs from the rest of the economy were entirely offset by 
declining productivity of land or reduced capacity of the labor force. 
Such a conclusion seems to the writer to be highly untenable, yet no 
other conclusion is possible from the data on employment and output. 

Our discussion to this point may be summarized. The MRLF pre- 
sumably provides an estimate of the number of employed persons 
(though not necessarily all are at work) whose major work activity 
is in agriculture. The BAE estimate can best be described as the num- 
ber of different farm jobs that are filled at a given time. The BAE 
estimate, though designated as total farm employment or the number 
of persons at work on farms," bears an unknown and variable relation 
to the number of people at work on farms because of the multiple 
counting of individuals working on more than one farm. The BAE 
series probably has lost any historical continuity that it may have had 
prior to the recent revisions of the estimates for 1931 to date. This 
creates a serious situation for research workers since the BAE es- 
timates are the only official estimates of farm employment prior to 
1940. 

Many uses of employment estimates, including important uses 
made by workers in the BAE, require a very different definition of farm 
employment than is now implied by the current BAE estimates for 
recent years. An estimate of the amount of work done, in terms of 





13 In the Farm Income Situation, August, 1950, the estimates of average income from agriculture 
and government payments were revised for 1935 to date on the basis of the current estimates of farm 
employment. See ibid., page 28. ‘ 

4 USDA, BAE, Farm Production Practices, Costs and Returns, Stat. Bul. No. 83, October, 1949, p. 
67 and Agricultural Outlook Charts, 1950, p. 19. In the 1951 edition of Agricultural Outlook Charts, 
the series included in 1950 has been dropped. A new series called output per man-hour has been in- 
cluded (see ibid., p. 5). This series, however, was not derived from any estimate of farm employment 
as that term is commonly used. In other words, the man-hours of farm work is not the actual number 
of hours worked on farms, but is an estimate of the total time required for an average adult male worker 
to perform the tasks that normally would be associated with the farm output produced during the year 
assuming a given standard rate of performance. The standards for computing man-hours change from 
time to time to reflect the introduction of machinery, new seeds, etc. 

5 See Farm Labor, Feb. 10, 1950, p. 1 and ibid., Dec. 11, 1950. 
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hours, days or months or years, would be much more appropriate 
than the number of jobs filled for such uses as estimates of farm labor 
income, output per worker or changes in labor use in agriculture. 

The remainder of this paper is devoted to the problem of estimating 
the amount of farm work done. The accepted measure of work done 
is the number of years of work. The actual composition of the labor 
force is taken as given and no attempt has been made to adjust for 
changes in composition or to express the work in some man-equivalent 
unit. Likewise no attempt has been made to adjust for the length of 
the work week as long as certain minimum requirements are met. 


3. ESTIMATES OF FARM WORK DONE 


Neither the MRLF nor the BAE estimates can be used directly as 
indicators of the amount of farm work done. Though the MRLF ap- 
pears to provide the most accurate estimate because it avoids duplicate 
counting, it may underestimate farm work done. First,’ the major 
activity criterion will estimate the amount of farm work only if workers 
are classified into industry categories in roughly the same proportions 
as the actual number of hours of work were performed in each industry. 
Second, the MRLF age limitation excludes a part of the working force 
and the MRLF household sample may miss certain workers, such as 
imported or migratory workers. Finally, the MRLF may fail to obtain 
adequate representation of unpaid family workers. 


a. Hired farm work 


The BAE has made two different types of sample surveys in recent 
years that provide estimates of the total number of days of hired 
work on farms. One set of surveys used the MRLF sample and ob- 
tained estimates from all persons reporting any hired farm work in the 
past calendar year of the number of days of such employment. The 
second group of surveys used an establishment sample and obtained 
estimates from employers of the number of days of hired work done 
on their farms. Because employment was measured in days, there is 
little or no problem of double counting of work done. 

In Table 3 the estimates of the number of days of hired employment 
on U. S. farms are given for the two types of surveys. For 1947 and 
1948 the results are remarkably close together. The 1946 Census 
(household) survey is suspect by workers in the BAE because the 
“1946 levels of e ployment appear low . . . .”!* Both series are subject 





% The Hired Farm Working Force of 1947, Foreword. 
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TABLE 3 


ESTIMATES OF TOTAL NUMBERS OF DAYS OF HIRED FARM EMPLOYMENT FROM 
HOUSEHOLD AND ESTABLISHMENT REPORTS, 1945 TO 1949 








Household Report (in millions of days) 





Establishment 


Workers included Total Workers Report|| 
in Survey* 





1945 403* 
1946 313t 392t 481 
1947 360t 421t 447 
1948 390§ 462§ 463 
1949 372§ 448§ 





* L. J. Ducoff and M. J. Hagood, Employment and Wages of the Hired Farm Working Force in 1946, 
p. 39. 
t+ Ducoff and Hagood, Farm and Nonfarm Wage Income of the Hired Farm Working Force in 1946, 
pp. 14, 15, and 22. The number of days work done by persons excluded or not covered in the survey 
were assumed to be 40 for children under 15 and 125 for all other groups. 

t Ducoff and Hagood, The Hired Farm Working Force of 1947, pp. 7 and 16. The number of days’ 
work done by workers not covered in survey estimated in the same way as in footnote (f). 

§ Gladys K. Bowles, L. J. Ducoff, and M. J. Hagood, The Hired Farm Working Force, 1943 and 
1949, pp. 14 and 45. 

|| BAE, Surveys of Wages and Wage Rates in Agriculture, Report No. 21, p. 39 and Report No. 22, 
p. 36. 


to memory bias and the household estimates have been adjusted up- 


ward by about 20 per cent to include such workers as children under 14, 
prisoners of war, imported workers, and migratory workers not included 
in the sample. 

Table 4 gives estimates of the number of days of farm work done by 
hired workers if it is assumed that 240 days a year represent full time 
work and that the employment estimates of the MRLF and BAE 
represented full time workers. This procedure does not appear to be 
too unreasonable for the MRLF. For 1948 and 1949 it is reported 
that the average hours worked by persons at work and classified in 
agriculture averaged about 50 per week.'’ Because of the double 
counting in the BAE estimates, this is a more questionable procedure 
and was done only to reflect the possible degree of over-estimate of 
work done. 

In preparing Table 3, it was estimated that the household report 
excluded or missed about 60 million days of work each year. If this 
were added to the MRLF estimate, the latter is increased by about 15 
per cent. Taking this adjusted MRLF estimate as a reasonably ac- 





17 Bureau of the Census, Annual Report of the Labor Force, 1949, Series P-20, No. 19, p. 6. Since 
about 2.5 per cent of the workers included had a job, but were not at work, the average hours of work 
should be reduced by about one hour. Some of the hours are worked at nonfarm jobe. 





200 AMERICAN STATISTICAL ASSOCIATION JOURNAL, JUNE 1951 


curate estimate of the amount of farm work implies that the BAE 
estimate, if taken as a measure of labor time, involves an over-estimate 
of about 20 to 23 per cent. 


TABLE 4 


FULL TIME EQUIVALENT NUMBER OF DAYS’ WORK IN MRLF AND 
BAE ESTIMATES OF FARM EMPLOYMENT 








MRLF* BAE* 
(millions of days) 





1945 422 544 
1946 401 558 
1947 412 578 
1948 419 599 
1949 443 583 





* Estimated by multiplying employment estimates by 240 to indicate the approximate number of 
days’ work that the MRLF and BAE employment estimates would reflect if all workers were full time 
farm wage workers. 

Our procedure has involved the assumption that the annual surveys 
of hired farm employment from the household and establishment 
samples give roughly the same estimates when adjustments are made 
in the household sample for workers excluded by definition, such as 
children, and workers not adequately reflected in the sample, such as 
migratory workers. The annual surveys are obviously subject to 
memory bias, but a comparison of the MRLF estimate and the annual 
estimate from the same samples does not indicate this to be very 
great, if we may assume that persons with multiple jobs distribute 
their hours of work in the same proportion as the workers are dis- 
tributed among the various employed categories. This does not seem 
to be too unreasonable an assumption for hired farm workers.'® 


b. Family Employment and Work Done 


The MRLF estimates of the labor input of family members excludes 
the work done by children under 14. Since children do contribute ef- 
fectively to farm work at ages younger than 14, they should be in- 
cluded. Though there are farm jobs that children under 14 cannot do, 
there are many jobs that such children can perform as effectively as 
an adult. These include such tasks as driving a tractor where the opera- 





18 The annual survey of the hired farm working force for 1949 gives data to support the assumption. 
Workers were classified by chief activity during the year. These were 308,000 with farm wage work as 
the chief activity but who did some nonfarm work, and 512,000 whose chief activity was nonfarm work 
but who did some farm work. The latter group performed about 18.4 million days of hired farm work 
and the former group did 18.6 million days of nonfarm wage work. See The Hired Farm Working Force, 
1948 and 1949, pp. 10 and 14. 
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tion is a routine one, caring for poultry and small livestock, and milking 
cows. 

So far as we know, there are no accurate estimates of the labor per- 
formed by farm children under 14. Anything that can be said must be 
based on general knowledge of agriculture and any insight that such 
data as school enrollment provide. In recent years about 96 per cent 
of all farm children 10 to 13 years of age have been enrolled in school 
and the school year generally covers nine months. While a small num- 
ber may contribute as much as 15 hours per week during the school 
year, most do sufficient work to be included only during the summer 
time. 

During the past decade, the number of children 10 to 13 has de- 
clined from about 2.7 to 2.3 million. If all of them worked full time 
three months out of the year, their work contribution would equal 
about 600,000. But since almost half are girls and girls do not con- 
tribute much to farm work except in the South and specialty crop 
areas, this estimate is much too high. Furthermore, even in the South 
where more than a million children do some work in the cotton har- 
vest,'® the period when children may effectively help in farm crop 
work is not as long as three months. A figure of 300,000 to 400,000 
seems more reasonable. 

Since both the MRLF and the BAE estimates of family employment 
have seasonal patterns that are reasonably consistent within each 
series from year to year, it was thought that a comparison of the 
differences in average employment between March and April and June 
through October might give some indication of the number of children 
included in the BAE estimates during the summer months. For 1947 
the seasonal increase in the BAE exceeded that of the MRLF by a 
monthly average of 200,000 and by about 350,000 in 1949. But in 
1948 the MRLF seasonal change exceeded the BAE by 200,000. Such 
variations in seasonal change are within the limits of sampling vari- 
ability and other explanations of the smallness of the differences may 
be pertinent. 

A second possible source of under-estimation may be the counting of 
unpaid family workers, especially females. A study of Ducoff and 
Bancroft in April, 1944 indicated that the MRLF was failing to classify 
as members of the labor force 1,310,000 females and 180,000 males 
who, in April, 1944, did 19 or more hours of farm work. Subsequent 
to this time, the MRLF revised its estimates of family employment 

19 See Carl C. Tylor, et al., Rural Life in the United States, New York, Alfred A. Knopf, 1949, p. 


roca here is no indication in the source, however, that the children were all under 13. Many may have 
older. 
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for April, 1944, from 5,000,000 males and 700,000 females to 5,300,000 
males and 1,170,000 females. The revision added 470,000 females to the 
employed category. This implies that 840,000 females may still have 
been participating in farm work and would not have been so classified 
by the questionnaire now in use. Since the study was carefully done, 
some credence must be given to the possibility that the MRLF under- 
estimated unpaid family labor by as much as 800,000 in the mid- 
forties. 

There is some reason for believing the degree of under-estimation 
may have been reduced somewhat by the late forties. During a time 
of declining employment of males and an increase in farm income, the 
MRLF now indicates an increase of about 40 per cent in the number 
of female unpaid family workers in agriculture. In addition the number 
of female farm residents with nonfarm jobs increased by 350,000 or 
by more than 50 per cent between April, 1940 and April, 1949.?° This 
has resulted in an increase of the proportion of females over 14 in the 
labor force from 14.8 per cent in April, 1940 to 22.9 per cent in April, 
1949. For the rest of the resident groups, the female labor force par- 
ticipation rate increased only slightly—from 31.2 per cent to 32.5 per 
cent between the two dates. 

Though there were perhaps 6,000,000 farms on the average during 
the forties by the Census of Agriculture definition, the MRLF generally 
classified roughly 4.5 to 4.8 million persons as farm operators. This does 
not seem to be an under-estimate. The 1940 Census of Agriculture 
indicated that 1,747,000 farm operators reported an average of 137 
days of work off the farm with 943,000 of this number reporting over 
100 days of off farm work for 1939.*":A sample study made by the 
BAE for 1943 indicated that 2,980,000 farm operators did some off 
farm work and that 1,390,000 did 100 or more days of work.” The 
1945 Census of Agriculture indicates that 1,660,000 farm operators 
worked off farms and 1,130,000 did so for more than 100 days.” 

Does the major work activity criterion of the MRLF result in an 
under-estimate of farm work done by family members? Where both 
of the jobs are in agriculture, no problem arises, of course. But where 
one job is in non-agriculture, the distribution of workers between 
agriculture and non-agriculture could be quite different than the dis- 
tribution of work. We did not find this to be true for hired workers as 
noted above. 





20 See Series Census-Bae, No. 14, Table 3, The number of unemployed declined by 100,000 between 
the two dates. 

1 U.8. Census of Agriculture, 1945, Vol. II, General Report, p. 236. 

2 Series Census-BAE, No. 6, Table 6. An estimated 850,000 worked 250 days or more at off farm 
work. 

%U. 8. Census of Agriculture, 1945, Farms and Farm Characteristics by Value of Products, p.xxv1 
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In April, 1944, 1,110,000 individuals who were members of house- 
holds containing a farm operator and who were classified as nonfarm 
workers contributed some farm work during the survey week, and 
610,000 contributed 19 or more hours.™* The total number of farm resi- 
dents whose major activity was nonfarm work was 2,680,000.* 

In July, 1946, 440,000 workers had a primary job in non-agriculture 
and a secondary job in agriculture; 240,000 had the reverse com- 
bination.” If both groups of individuals worked the same proportion of 
their time in the primary job, some under-estimate of the time worked 
in agriculture would occur.” 

The major work activity criterion probably does result in some 
under-estimate of the amount of time worked in agriculture. More 
persons with jobs in both agriculture and non-agriculture have their 
primary jobs in non-agriculture than in agriculture. The importance of 
the under-estimation is a function of the differential in the classification 
of persons into their primary job and the actual division of time 
worked. We have some insight relative to the former, but little or none 
into the latter. 


c. Summary 


The MRLF may provide a rather good estimate of the amount of 
work done (time spent) by hired workers in agriculture except for the 


rather minor exclusion of children under 14 and the failure to enumerate 
families whose household status is difficult to determine. Relying 
largely on estimates made by workers in the BAE, the workers ex- 
cluded by definition or by difficulties in sample design may contribute 
about 15 per cent as much work as those included in the MRLF 
estimate. 

Our analysis of the self employed and unpaid family categories in- 
dicates that the exclusion of children under 14 might result in an under- 
estimate of 5 per cent. In the unpaid family category, there is evidence 
that many females were not counted that should have been. However, 
the marked increase in labor force participation by female farm resi- 
dents since 1940 may indicate that most of this shortcoming has now 
been overcome. If true, it means that the estimates of the early for- 
ties are no longer comparable to the current ones. 





™* Ducoff and Bancroft, op. cit., p. 210. , 

*% Joint Committee on the Economic Report, Employment and Unemployment (Washington, 1949) 
p. 14. 

* Bureau of the Census, Multiple Employment: July, 1946, Series P-S, No. 21, Table 2. 

*7 If the time division were two-thirds in the primary and one-third in the secondary job, division 
on the basis of time worked of these 680,000 workers would have placed 307,000 in agriculture and 373,000 
in non-agriculture. If the time spent in the primary job were three-fourths of the total the division 
would have been 390,000 in non-agriculture and 270,000 in agriculture. 
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The estimate of farm operators who devote the major share of their 
time to farming seems to us to be a reasonable one. It is in line with 
what one would expect from Census reports of off-farm work by farm 
operators. 

The MRLF may under-estimate farm work done for unpaid family 
and self-employed workers through the use of the major work activity 
criterion. The size of this under-estimate cannot be readily determined 
from available data. 

In concluding, it should be noted that the MRLF includes in the 
estimates of employed workers an annual average of almost 200,000 
workers with a job but not at work during the survey week. Normally 
these workers would not be included in any estimate of farm work 
done. The MRLF also includes a number of persons with “nonfarm” 
types of jobs on farms and persons employed in some processing ac- 
tivities—work not usually considered a part of the farm sector. In 
March, 1940, these numbered about 140,000. 


CONCLUSIONS 


The present BAE farm employment series fails to provide a reason- 
ably accurate indication of the amount of farm work, either at an 
given time or in terms of trend. Changes in the basic economic struc- 


ture of agriculture alter the relation between the number of different 
jobs and the amount of work done. The definition of farm employment 
used and the method of obtaining the data used in the BAE series 
makes it impossible to derive an estimate that is satisfactory for the 
most frequent uses of farm employment estimates. 

The MRLF estimates have limitations, too. This is true, appar- 
ently, regardless of the use to which one puts the data. The series fails 
to include in the labor force certain individuals who contribute sub- 
stantially to farm work. The sample probably fails to obtain data for 
workers without a fixed household. And by definition, workers under 
14 are excluded. But despite these comments, the MRLF series is not 
too unreliable as an indicator of farm work done. 

One should not criticize any employment series because it does not 
provide an estimate that exactly meets the definition of employment 
that one may have in mind. There are many definitions of employ- 
ment, each appropriate to some particular use. The MRLF estimates, 
sampling and response problems aside, the number of people depend- 
ent upon agriculture for their major activity. This, we believe, is an 
estimate useful for many policy and research purposes. The definition 
of employment emerging from the definitions and source of data used 
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by the BAE does not impress us as being very satisfactory for any 
really important policy or research problem. The BAE series reveals 
how many different farm jobs there are, without revealing how many 
different persons are involved or how much work is performed at 
these jobs. 

There is undoubtedly a need to continue obtaining estimates of 
farm employment by relying upon establishment reports. The MRLF 
sample is too small to permit regional estimates of farm employment 
and the sampling error of the national estimate is relatively large.”* 
Though additional expense would be involved, the usefulness of es- 
tablishment reports would be greatly enhanced if information were 
obtained on the time worked on each farm by each person working. 
The total time worked could then be translated into days, weeks, or 
months. It might still be desirable to retain some minimum work time 
per week for unpaid family workers as a means of excluding relatively 
incidental work. Since the BAE occasionally estimates the length of 
work day for operators and hired workers, such data are apparently 
obtainable. 

The MRLF estimates of farm employment would be made more 
useful if there were an occasional tabulation of the actual distribution 
of hours by major industry worked by persons with multiple jobs. 

During the past decade, there have been remarkable increases in 
our knowledge of farm labor and the farm working force. The BAE 
has provided valuable data on the amount of farm hired work, on the 
composition and characteristics of the hired work force, the amount 
of nonfarm work done, and the occupation of farm hired workers in 
some subsequent period. Most of the new material has re's‘ed to the 
hired farm workers; we need to know much more abou’ t!e character- 
istics and activities of the operator and his family as these relate to 
their work activities. Because of the close relation between the house- 
hold and the firm, such data can be obtained from establishment 
reports. 





28 There is one chance out of twenty that a complete census would differ from the sample estimate 
by about one million. See Series P-50, No. 2, p. 10. 





ESTIMATING PARAMETERS OF LOGARITHMIC-NORMAL 
DISTRIBUTIONS BY MAXIMUM LIKELIHOOD 


A. C. CoHEn, JR. 
The University of Georgia 

This paper is concerned with the three-parameter logarith- 
mic normal distribution; i.e., the general distribution in which 
the terminus is unknown. Maximum likelihood equations for 
estimating population parameters from random samples are 
derived, and an iterative method for their solution is out- 
lined. Variances and covariances of these estimates are ob- 
tained from the information matrix. An illustrative example 
is included. 


1. INTRODUCTION 


HE logarithmic normal distribution provides a useful theoretical 

model for studying a number of biological populations, certain 
economic populations involving income distributions, and others in 
which the standard deviation of individual observations is approxi- 
mately proportional to the magnitude of the observations. Samples 
from distributions of this type have previously been studied by various 
writers including Kapteyn [1], Wicksell [2], McAllister [3], Gumbel 
[4], Jenkins [5], Yuan [6], and Finney [7]. Kapteyn employed a method 
based on selected points to estimate population parameters from ran- 
dom samples. Wicksell, Gumbel, and Yuan used the methods of 
moments. Finney published maximum likelihood estimates of the 
mean and standard deviation for the case in which the terminus of 
the distribution is known. In the present paper, we obtain maximum 
likelihood estimates which are applicable to the more general situation 
in which the terminus is unknown and must therefore be estimated 
from the sample. The probability density function for the general 
three-parameter logarithmic normal distribution considered here is 





1 
1 z)= — e~ (1/27) og?(z—a)/B; z>a. 
(1) fe) =§ —— g(2—a)/B; 
It derives its name from the fact that the standard normally dis- 
tributed variable ¢ with mean zero and variance unity is related to the 
observed variable x by the equation 


(2) bie oi C=) 
Y p / 


206 
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2. MAXIMUM LIKELIHOOD EQUATIONS 


If x is distributed according to (1), the likelihood function for a 
random sample of size n from this population is 


1 «> FF . 1 n 
P(x, %2, °° * > Zn) = — — (1/24) 2 log? (z,—a)/B. 
7 , (<a) II(; - Je , 


Taking logarithms of (3), differentiating, and equating to zero, we ob- 
tain the maximum likelihood equations 


EN) _ (REP 1) ¥/ 1 ) =, 
. 7? 1 li-—a 





> log (2; —a) 





From the second equation of (4) we have 
1 n 
(5) log B = =a >, log (ai — a). 
1 


Upon substituting this value in the third equation of (4) and solving 
for y?, we obtain 


1 n 1 n 2 
(6) — p> log? (x; — a) — (— . log (4; — «)). 
» nm 1 


When the results of (5) and (6) are substituted in the first equation of 
(4), we have 


(a) = | x(——)] E > log (x1 — a) — D> log? (2; — a) 
(7) 


1 no @ 


+ ( > log (% — «) | - n? > as Sa FH = 0. 


=~ @4 





We must solve (7) to obtain the maximum likelihood estimate @, an 
subsequently substitute this value in (5) and (6) to determine es- 
timates 8 and 7. 
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If desired, estimates of the mean, standard deviation, third standard 
moment, and median of x can be expressed as functions of a, 8, and y 
with the aid of the following relations from Yuan (loc. cit.): 


Mz = a + Bw? 
oz = BVw(w — 1), 
= (w + 2)Vw — 1, 
Me. =a+t 8, 


(8) 


o= ev’. 
3. ON THE SOLUTION OF THE EQUATION (a) =0 


Equation (7) can be satisfactorily solved by inverse interpolation 
once a sufficiently narrow interval (a;, a2) has been located such that 
(a1) >0, and A(az)<0. The summations appearing in (7) can be 
evaluated from the sample data with the aid of a suitable table of 
logarithms. A value slightly less than the smallest sample observation 
2, will usually be satisfactory as a first approximation to &. As the 
sample size is increased, this approximation will approach closer to 
the corresponding population parameter. It is interesting to note that 
X(a)>0O for values of a sufficiently smaller than 2. Furthermore 
Lim... (a) =0, and Lim.._., A(a) = — ©. A graph of this function 
for the sample considered in Section 5 will be found in Figure 1. 























Ficune 1. Graph of \(a) for Sample Data of Section 5. 
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4. ALTERNATE ESTIMATE OF @ BASED ON LEAST OBSERVED VALUE 


An alternate and more easily computed estimate of a based on the 
least observed value in the sample can be determined if we require 


that 
(9) my-az= Bev‘, 


where 29 =21+6/2, with x, being the least sample observation, and 6 
the interval of precision; i.e. the smallest scale interval employed in 
reading sample measurements. We determine to from the relation 


z9 1 te 
(10) faa f f(e)de = f e-H/tdt, 


in which k is the number of times the least observed value occurs in the 
sample. Taking logarithms of both sides of (9) we have 


(11) log (xo — a) = log B + bo. 


Note that (11) also follows from (2) when z and ¢ assume the specific 
values 2» and to respectively. 

On substituting (5) and (6) into (11) we obtain the estimating 
equation 


O(a) = n log (to — a) — > log (1%; — a) 
(12) 


— te E > log? (4%; — a) — (= log (411 — ») |= 0, 


which corresponds to maximum likelihood equation (7) but which is 
much simpler to solve since it involves fewer summations. We can 
solve (12) for the estimate a* and subsequently substitute this value 
in (5) and (6) to compute estimates 6* and y*. The star (*) is employed 
to designate estimates obtained by the procedure of this section, 
whereas the conventional (*) designates maximum likelihood estimates. 
The estimate a* is consistent since if the sample were enlarged to in- 
clude the entire population, the estimate thus computed would be the 
actual population value for this parameter. Under these circumstances 
6 would approach zero. Consequently it is to be expected that whenever 
samples are large, estimates of a computed from (12) will not differ 
greatly from maximum likelihood estimates computed from (7). 
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5. A NUMERICAL EXAMPLE 


To illustrate the application of the foregoing results in a practical 
situation, let us consider the following sample consisting of 20 observa- 
tions which was formed by applying the inverse of Equation (2) with 
a=100, B=50, and y=0.4 to a sample selected from Mahalanobis’ 
Tables [8]. The mean, standard deviation and third standard moment 
of this sample are 152.30, 19.45 and 0.75 respectively. Using 


RANDOM SAMPLE FROM LOGARITHMIC NORMAL POPULATION 








148 .290 144.328 174.800 168 .554 
184.101 166 .475 131.375 145.788 
135 .880 137 .338 164.304 155 .369 
127.211 132.971 128.709 201.415 
133.143 155 .680 153 .070 157 .238 





a, = 126.211 which is one less than the smallest sample observation, 
as a first approximation, we subsequently find \(117)=0.5071 and 
(118) = —0.2362. Interpolating linearly between these two values we 
have 2=117.68. The estimates 8 and 7 are obtained by substituting 
this value in (5) and (6). In determining the alternate estimate of a 
from Equation (12), we obtain @(100) =0.2856 and 6(105) = —0.0214. 
Interpolating linearly between these values we compute a* = 104.65. 
Again we find estimates of 6 and y from Equations (5) and (6). Cor- 
responding population parameters, maximum likelihood estimates, 
alternate estimates based on least observed values, and moment 
estimates computed from Yuan’s equations are tabulated below for 
comparison. 


SUMMARY OF ESTIMATES 








Alternate 

Estimates 
(Based on least 
observed values) 


Maximum 
Likelihood 
Estimates 


Population Moment 


Parameter Values Estimates 





100 .0000 72.90 117.68 104.65 
50 .0000 77.12 29.21 43 .90 

0.4000 0.241 0.604 0.407 
Mz = a+ Bw'/? 154.1644 152.30 152.73 152.34 
G2 =B+/w(w—1) 22.5620 19.45 23.24 20 .22 
Me,=a+ 8B 150 .0000 150 .02 146.89 148.55 
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From the above tabulation, it is noted that alternate estimates of a, 
8, and y based on least observed values are in closer agreement with 
population parameters than either maximum likelihood or moment 
estimates. This may be largely due to the effects of bias in small sam- 
ples, but a more definite statement must await further investigation. 
All three sets of estimates of the mean, median, and variance are in 
close agreement with population parameters. With larger samples, it is 
to be expected that agreement among the different estimates of a, 8, 
and y will be somewhat closer than that exhibited in the present il- 
lustration. 

6 PRECISION OF ESTIMATES 


The likelihood information determinant, taking the parameters in 
the order, a, 8, y is found to be 


1 of 7? : @wi/2 —Qw1/2 
( By? )e By? By 
wil? 1 2w : , 
B22 By? 0 wr; [o(1 + y*) — 27? — 1], 
—Qw'/2 we 2 
By : 








with w = e”’. 


From this we obtain the following asymptotic variances and co- 
variances of the maximum likelihood estimates: 
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7. WHEN @ IS KNOWN A PRIORI 


When the terminus a is known a priori, the maximum likelihood 
equations (4) for estimating 6 and y become 


1 n 
loge = — DX log (xi — a), 
nm 1 


(14) >: » 1 n 2 
vy? = — Do log? (x; — a) — (— >, log (2: — «)), 
Ss 4 4 
and the variances and covariances (11) become 
By? 
r n 
2 


V(q) = —» 


n 
(Cov (8, 7) = 0. 


V(B) 


, 
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RECENT MORTALITY TRENDS AND DIFFERENTIALS 


Iwao M. Moriyama 
Public Health Service 


HE spectacular rise in the birth rate during the war and postwar 

years has all but overshadowed the very significant decline in . 
mortality. Except for the upward movement in the death rate in 1943 
due to an influenza epidemic, the age-adjusted death rate for the 
continental United States declined steadily each year during the past 
decade. New record lows in mortality were established every succeed- 
ing year. In all, mortality as measured by the age-adjusted death 
rate! declined about 18 per cent in the 10 years between 1940 and 
1949. In the previous decade, the decrease in mortality was 15 per 
cent. 

For the first time in the history of the country, the crude death rate 
dropped to 10.0 per 1,000 population in 1946. In 1948, the crude death 
rate further declined to 9.9, and to 9.7 in 1949. If the death rate pre- 
vailing before World War II had continued through the decade, there 
would have been about 1,300,000 fewer people enumerated in the 
1950 census of population. 

The death rates during the war years are difficult to evaluate. The 
withdrawal of men from the civilian population for military service, 
and the large shifts in population to defense areas created a problem in 
the analysis of mortality rates. For example, there was the problem 
of residence allocation of decedents who were stationed in military 
establishments and away from home at the time of death. Because 
most of these deaths resulted from the hazards of the area in which 
the men were stationed, it was decided that mortality statistics would 
be more meaningful if the deaths of members of the armed forces 
occurring in the United States were allocated to the military station 
as the place of residence. Then, there was the question of the population 
base for computing death rates. Since mortality statistics for the 
United States did not include deaths of the armed forces overseas, 
the logical population base was the de facto population. The de facto 
population was also the logical base for the computation of death rates 
for the individual States, because the deaths among the armed forces 
stationed in camps within the State were allocated there as the place 
of residence. 

The results of these computational procedures correctly indicate the 





1 Computed by the direct method using as the standard population the age distribution of the 
population of the United States as enumerated in 1940. 
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mortality risk for the population present in an area in a particular 
year. However, when these rates are compared with those for other 
years, the change in the population characteristics affect the com- 
parability of the crude death rates and death rates for certain age 
groups. For example, the crude death rate for Arizona dropped pre- 
cipitously in 1942 and 1943 merely because of the establishment of 
large military camps in the State which increased markedly. the de- 
nominator of the rate but contributed relatively little to the number of 
deaths. There are other examples of distortion in death rates. The 
crude death rates for heart diseases and other conditions common in 
the older ages, and for diseases such as tuberculosis showed increases 
with the movement of the armed forces out of the country. These 
apparent increases in the crude rates were artificial. In the case of 
diseases of old age, very few men of military age normally die of such 
diseases. Yet, they are normally included in the denominator of the 
rates. In the case of diseases such as tuberculosis, very few deaths 
occurred overseas because the men in the armed forces were screened 
for certain diseases and defects prior to their induction. Thus, the 
death rates computed on the de facto population base indicated higher 
mortality even though there were no actual change in the risk of dying 
from these causes. 

We have here examples of a technically correct population base 
giving misleading results. For many causes of death, use of the de 
jure rather than the de facto population is more satisfactory. However, 
it is not possible to make a precise evaluation of changes in mortality 
conditions in the United States during the war years because the 
populations were not comparable with those of the prewar and post- 
war years. 

From some points of view, comparability of mortality risks is un- 
important. For example, as a measure of one element of population 
change, it is sufficient to know only the rate of dying of the U. S. 
population regardless of the conditions under which the deaths oc- 
curred. Figure 1 shows the crude death rates including and excluding 
deaths among the armed forces overseas, for the years 1930-1949. 
The crude death rates including the armed forces overseas indicate 
the attritional effects of the war on the population of the United 
States. The number of deaths among the armed forces overseas was 
relatively high from 1942 to 1945, inclusive. The crude death rate for 
the total U. 8. population reached a wartime peak of 11.4 per 1,000 


population in 1944. Incidentally, this rate was the highest recorded 
since 1936. 
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The crude death rates excluding the armed forces overseas show a 
sharp increase in mortality in 1943, a relatively high Jevel in 1944 and 
1945, and a drop-off in 1946. Part of the increase in 1943 was due to 
the influenza epidemic in the winter of that year. However, a good part 
of the apparently high crude death rates during the years 1943-1945 
resulted from the withdrawal of young men from the continental popu- 
lation for service overseas. The age-adjusted death rates, also shown 
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FicureE 1 


in figure 1, minimize the effect of the overseas movement of the popu- 
lation on mortality rates because the death rates for the younger 
ages are given less weight in the adjusted rate. The age-adjusted 
death rates indicate a relatively steady decline in mortality for the 
continental population, except for the rise in the death rate in 1943. 
It would appear from these figures that the mortality experience of 
the continental population of the United States was rather favorable 
during the last decade. ; 

Data on overseas deaths by age for the war years are not yet avail- 
able. Therefore, in figure 2 are presented death rates by age for the 
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past decade based on deaths registered in the continental United 
States and on the de jure population. The rates shown by the broken 
lines are based on the de facto population. 





DEATH RATES BY AGE: UNITED STATES, 1940-1949 
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FIGuRE 2 


The general mortality picture for the past decade is one of declining 
rates for every age group. The death rate for each age group in 1949 
was the lowest ever recorded for the country. The greatest gains in 
mortality were recorded for the younger ages, although the population 
in each age group contributed to the decline in the total death rate. 

Between the years 1940-1949, the infant mortality rate (based on 
live births) declined 34 per cent; the death rate for the age group 1-14 
years, 40 per cent; the death rate for the age group 15—24, 35 per cent; 
and the death rate for 25-44 year age group decreased 30 per cent. 
The death rates for the older age groups did not fall as rapidly as 
those for the younger groups. However, significant reductions in 
mortality are noted even in the older age groups. The estimated death 
rates for the age groups 45-64, 65-74, and 75 years and over in 1949 
were 13, 10, and 11 per cents, respectively, lower than the death rates 
for the corresponding age groups in 1940. 
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The death rates for the two age groups 15-24 and 25-44 do not show 
a continuous decline through the 10-year period when computed on the 
de facto population base. The withdrawal of men for service overseas 
has the effect of flattening out the death rate curves during the war 
years. When the de jure population base is used, the decline in the 
death rates for the age groups 15-24 and 25-44 is unmistakable. 
However, the inclusion of deaths among the armed forces overseas 
would have the effect of significantly raising the death rates for these 
age groups during the war years. 

It was mentioned before that the age-adjusted death rate declined 
at a slightly greater rate during the decade, 1940-1949, than in the 
previous 10 years. Because age-adjusted death rates are heavily 
weighted by the mortality experience of infants, examination might be 
made of a mortality measure excluding the population under 1 year 
of age. Such a measure is the average remaining lifetime at age 1. 
These life table values indicate that the greater decrease in the age- 
adjusted death rate during the past decade as compared with that for 
the previous 10 years was due primarily to the greater improvement 
in the infant mortality rate during the past 10 years. Except for the 
mortality experience among nonwhite females, the relative increase 
in the average remaining lifetime at age 1 was greater between 1930 
and 1940 than between 1940 and 1948. This would be true even if al- 
lowance were made for the fact that the period 1940-1948 does not 
represent a complete decade. 

The death rate for females (figure 3) has been lower than that for 
males. During the past 10-years, the differential in mortality between 
the two sexes has been growing wider. For example, the age-adjusted 
death rate for males in 1930 was 20 per cent higher than that for fe- 
males. In 1940, the difference was 28 per cent and in 1949, the sex 
differential was 44 per cent. 

It is also significant that the once large differential in mortality 
between races is getting narrower. In 1930, the age-adjusted death 
rate for the nonwhite race was 72 per cent higher than that for whites. 
In 1940, the difference was 59 per cent. By 1949, the differential had 
decreased to 26 per cent. 

The mortality experience during the past decade has been extremely 
favorable. There was considerable concern expressed during the war 
years over conditions believed to be conducive to an increase in mor- 
tality in the United States, but these fears did not, fortunately, ma- 
terialize. Tuberculosis mortality did not show the anticipated increase 
in the face of stress and strain of wartime conditions, and long hours 
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of work. There were no influenza epidemics of the proportion that 
followed in the wake of World War I, in spite of the concentration of 
the population in military camps and defense areas. The infant and 
maternal mortality rates continued to decline through the period of 
the great boom in the birth rate. 





AGE-ADJUSTED DEATH RATES BY SEX AND BY RACE: 
UNITED STATES, 1930-1949 
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Fiaure 3 


The draining of physicians from civilian life for military service 
was cause for alarm from the standpoint of preserving the health of the 
civilian population. There was no question that the doctors in civilian 
practice were overworked, but the increased use of hospital facilities 
probably resulted in a more efficient utilization of medical personnel. 

Of the principal causes of death, the crude death rate for cancer was 
about 11 per cent higher in 1948 than it was in 1940. Comparison of 
the age-adjusted rates for the same period indicates an increase in 
the cancer death rate of less than 5 per cent. (This discussion on causes 
of death will not include data for 1949 in order not to get into the 
problem of comparability of causes of death resulting from the 1948 
revision of the International Statistical Classification of Diseases, 
Injuries and Causes of Death and the change in the method of select- 
ing the underlying cause of death.) The mortality from the other major 
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causes of death did not change or underwent substantial declines. 
The crude death rate for heart diseases in 1948 was about 10 per cent 
higher than it was in 1940, but almost all of this difference can be 
attributed to the aging population. Because of the relative large de- 
crease in the death rate for nephritis and intracranial lesions of vascular 
origin, there appears to be an actual decline in the mortality over the 
same period from the total cardiovascular renal complex of diseases. 
The age-adjusted death rate for pneumonia and influenza dropped 95 
per cent between 1940 and 1948, thanks to the newfound drugs and 
antibiotics. A large part of this reduction in rate occurred in the older 
ages and accounts for a portion of the decrease in mortality from the 
cardiovascular renal diseases because of the association between influ- 
enza and pneumonia mortality and excess deaths in the older ages not 
assigned to respiratory causes. The decline of 36 per cent in tubercu- 
losis mortality was also impressive. Motor vehicle accidents as a cause 
of death dropped off markedly during the war period when gasoline 
rationing severely restricted automobile travel. Although the death 
rate for motor vehicle accidents in 1948 was not as high as it was in 
1940, it appears to be coming back to its prewar level. Mortality from 
nonmotor vehicle accidents hit a high level during the war because 
of the increased military training and industrial activities, but it has 
dropped off in the postwar period. 

It is difficult to pin down precisely all of the reasons for the favorable 
mortality conditions during the past 10 years. In one respect, it is a 
continuation of the declining trend. This decline was probably ac- 
celerated by at least three factors, namely, better nutritional status 
of the general population due to increased prosperity during the war 
and postwar years, the widespread use of drugs and antibiotics in the 
control of infection, and the intensification of public health activities. 

The average length of life based on mortality rates for 1948, the 
latest year for which such data are available, was the highest ever 
recorded for the United States. The average expectation of life at 
birth was 67.2 years for the total population, 65.5 for white males, 71.0 
for white females, 58.1 for nonwhite males, and 62.5 years for nonwhite 
females. In the 9 years between 1940 and 1948, the average duration 
of life has increased 2.7 years for white males, 3.2 for white females, 
5.8 for nonwhite males, and 7.0 years for nonwhite females. 

The decade just past has been a remarkable one insofar as the U.S. 
mortality experience is concerned. It seems likely that the declining 
death rates will continue into the present decade. The expected upturn 
in general mortality due to the aging population does not appear to 
be in sight in the immediate future. 





ON THE VARIANCE OF ESTIMATES OF THE STANDARD 
DEVIATION AND VARIANCE 


W. Duane Evans 
U. S. Bureau of Labor Statistics 


ANY current statistical texts either omit expressions for the vari- 
M ance of sample estimates of the standard deviation and variance, 
or include results which are valid only for sampling from normal or 
infinite populations. In view of the extensive current use of sampling 
in the socio-economic field, where normal or infinite populations are 
not always conveniently available, this seems unfortunate. The present 
paper is intended to provide a convenient derivation and summary 
of references. 

THE ESTIMATED VARIANCE 


The following deals with a sample of size n drawn from a population 
of size N. The members of this population exhibit the values X,, 
X2,--++, Xw fora variate X. The r-th moment about its mean of 
this distribution is represented by u,, the standard deviation by oa, 
and the mean by X. Estimates are denoted by primes. 

The sample variance, represented by S?, may be defined as follows: 


1 


N 
(1) S? = ‘a > a;X;? — (X’)? 


where X’ is the sample mean and an unbiased estimate of X. Here a; 
represents a characteristic variate which is defined as equal to one if 
the i-th individual is included in the sample and zero otherwise, as 
suggested by Cornfield [1]. In this form, it is the a;, not the X;, that 
vary from sample to sample, and all summations cover all values of 
X; (hence the limits generally will be omitted). 

Following Cornfield [1], it may be shown that 


n(n — 1)(n — 2)--- 
(2) E(a,a;a; eae ) = 
N(N — 1)(N — 2)-:- 





(t,j, k, +++ mutually unequal). 


Taking expected values in (1), it is seen that the first term on the right 
becomes the mean of the sum of squares in the population, and the 
second, from a well-known result, is the square of the population mean 
plus the variance of the estimate of the mean: 
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(3) E(S?) = = Pie Bis ua(N — n) 


n(N — 1) 


N(n — 1) 
4 = 2 - 
() mn(N — 1) 
The origin of these expressions seems lost, if not in antiquity, at least 


in obscurity. (See, for example, Pearson [2].) 
It is clear that if an estimate of yu is defined as 


n(N — 1) 

Ph Ee S? 

N(n — 1) 
E(us’) - = 

and the estimate will be unbiased. 


The difference between the estimate and its mean value may be set 
down explicitly as follows: 


[S49 


pe’ = 


(7) 





1 
> XX; 
ye 2 2 Ks 
Squaring this expression and taking its expected value, using (2), will 
yield (after considerable simplification) the following exact expression 
for the variance of the estimated variance: 


bo(u2’) 
(8)_ (N—n)[u4(N —1)(nN —N —n—1) —:*(nN*—3N*—6N —3n—3)] 
7 n(n—1)N(N —2)(N —3) 


This essential result seems to have been derived independently and 
at about the same time by Neyman [3] and Tschuprow [4]. Their 
expressions are for the variance of S? rather than the unbiased es- 
timate of (5), but the two differ, of course, only by the square of the 
bias correction factor in (5). 

Various simpler formulations are available as approximations under 
limiting conditions. For example, if both N and n are reasonably 
large, (8) will reduce to 


(N — 1) (4 — 2”) oy (N - 7) p2?(Be — 1) 
(n — 1)N (n —1)N 











(9) Ho(us’) = 
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where #2 is the familiar Pearson kurtosis criterion. This is probably the 
most generally useful form. As N alone becomes very large both abso- 
lutely and in relation to n, (8) approaches the exact limit 


u2"(B2 — 1) E 4 (B2 — 3) ] 
(n — 1) n(62 — 1) } 


This is identical with an expression given by Wilks [5], p. 134. As 8, 
approaches the normal curve value of 3, (10) becomes the usual text- 
book formula, except that n is commonly substituted for (n—1). Since 
large values for 82 (ranging up to as much as 25 for income distribu- 
tions) are not infrequently encountered in socio-economic surveys, the 
deficiency of the usual formula is apparent. 

The unwieldy third and fourth moments of S* (from which the 
corresponding moments of yu,’ follow directly) were computed by 
Church [6] and Isserlis [7], the latter correcting the fourth moment 
formulation of the former. 

Little attention seems to have been given to the problem of es- 
timating a population variance from the results of a stratified sample. 
As an intuitive approach, one might form a weighted estimate of the 
population sum of squares and subtract the square of the estimated 
population mean, as follows: 


(10) po(pu2”) - 


. ss me oe 
(11) pe’ = — D) — 2 Xifais — (X")*. 
N ia 1; j=l 


This supposes r strata with a sample of n; and a total of N; individuals 
in the i-th stratum. The term a,; is defined as 1 or 0 depending on 
whether the j-th individual in the i-th stratum is in or not in the 
sample. By the same line of reasoning used in deriving (4), 


(12) E(us’) = ws — w2(X’) 


where y2(X’) is the variance of the estimate of the mean. This is 
known to be (Neyman [8]) 
pio? (N; — ni) 

ni(N; aaa 1) 





(13) wx(X’) = D> 


where p;(=N;/N) is the proportion of the population to be found in 
the i-th stratum and ¢a;? is the variance in this stratum. Since an un- 
biased estimate of this last quantity is available from (4), an unbiased 
estimate of the population variance may be formed as follows: 
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P pS? Ni — ni) 
ue!” = pe’ + Do \ 
E(y2"’) = fe. 
No attempt has been made to compute the variance of this estimate. 


The task should be tedious but not difficult, and the result highly 
complex and of limited usefulness. 





THE ESTIMATED STANDARD DEVIATION 


Because fractional powers of functions of sample values are involved, 
no exact expressions for the bias or variance of estimates of the standard 
deviation are available except for special cases. To derive an approxi- 
mation, suppose that the square root of the estimated variance, as 
defined in (5), is taken as an initial estimate of the standard deviation. 


(15) o’ = V(ur’) 


(16) = o4/1+ 2. 
Me 


Assuming that relative sampling errors as great as +1 can be 
excluded (the conditions can be made somewhat less rigorous but this 
adds to the length of the proof), this equation may be expanded in the 
convergent series 








(u2’ = be) (p2’ = be)? 
7) ot = of 1+ ™ tee]. 
o o Dus Sus? 


Taking expected values over the first three terms of the series, the 
second vanishes, and the third becomes the variance of the estimated 
variance. Using (9), 
(N — n)(®2 — 1)9 
8n—-—1)N J 





(18) E(o’) = of 


The bracketed expression measures (approximately) the extent of bias 
in the estimate. The bias clearly declines rapidly with increasing sample 
size. 

in most cases no bias correction will be required, but if one is de- 
sired it can be derived from (18). Representing the bracket above by 
K, we may set 


(19) 0’ =0'/K 
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and the expected value of o’’, within the limits of the approximation, 
will equal o. 

Substituting the first two terms of the expansion of (17) in (19), 
subtracting o, squaring, taking expected values, and again using (9) 
yields the following expression for the variance of the estimate of the 
standard deviation 


o°(N — n)(B2 — 2h + (N — n)(B2 — 2) 
4(n — 1)NK? l6n—I)N J 


In most cases the bias correction term K? and the bracketed term may 
be dropped. The usual textbook formula frequently uses the normal 
curve value for 82, and apparently universally drops the finite sam- 
pling factor and substitutes n for (n—1). This latter substitution is 
perhaps not important in view of the fact that the result is approxi- 
mate at best, but the retention of (n—1) does serve as a reminder that 
estimates of the variance and standard deviation require at least two 
units in the sample. 


(20) m(o"’) = 
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TRANSFORMATION FUNCTIONS IN THE THEORY 
OF PRODUCTION INDEXES 


Pau. B. Stimpson 
University of Oregon 

Indexes of output are defined by a transformation function 
expressing alternative outputs of given productive factors. 
Different interpretations of the function are possible, depend- 
ing on the definition of productive factors. Paasche and Las- 
peyres formulas are derived as linear approximations. The 
error is related to cost conditions of production. 


1. THE PROBLEM 


IVEN two sets of final outputs, say g’=(qi’, q:7,°--, gn’) and 

gq! =(q"", qol, - - + , Gn”) where g// and q;/’ represent quantities of 
the i-th commodity at different times or places, an index of production 
can be defined by using a transformation function 


(1) z = Q(q, G2, °** , Qn) = QQ). 


A given z defines a constant-product function comprising all com- 
binations of outputs achievable with the same input of productive 
factors. A larger z indicates a higher level of output. 

Production indexes based on this notion have been introduced into 
the discussion of social income by Hicks [4, 5], Little [8, 9], Kuznets 
[7], and Samuelson [11]. The price aspects of the approach have been 
considered by Court and Lewis [2]. Little formal development of the 
theory has been presented, however, possibly because equation (1) 
is mathematically analogous to the preference function which forms 
the basis of the economic theory of cost of living indexes, a theory 
which has been developed by a long line of writers, including Konus 
[6], Staehle [12], Frisch [3], Samuelson [11], Ulmer [13, 14], and Allen 
[1]. Differences arise, however, because the transformation function 
applies to an economic system, whereas a preference function applies 
to an individual, and because the restrictions on the shapes of the 
functions are different. 

In the next three sections index number theory based on (1) will 
be developed. In the fifth section we shall consider the significance of 
the fundamental transformation function. 


2. DEFINING THE INDEX 


The slope of a constant-product function, Q:/Q; (where Q; 
= 5Q/8q;) is the marginal rate of substitution of j for 7. Curve II of 
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Figure 1 illustrates a constant-product curve in two variables with 
increasing rates of marginal substitution. Economies of scale in pro- 
duction exist. In curve I, the marginal rate of substitution decreases, 

indicating diminishing returns when production is expanded. 
Increasing and decreasing returns to scale as a property of a con- 
stant-product function must be distinguished from a similar property 
pertaining to the level of production. In Figure 1, in passing from 
curve I to curve II along a straight line through the origin, a change 
in slopes indicates a change in marginal rates of substitution for sets 
Ip 








q 


Fiaur_E 1 


of outputs with the same ratios of commodities, It is clear, for example, 
that the slopes of the two curves in Figure 1 are not the same as q' 
and at r'q'. In the language of geometry, two constant-product func- 
tions z* and z® are homothetic for a set S of output ratios s=(1, 8:, 

- +, 8n), if for each g* on z* and q on 2 having the same s, there 
exists an r such that 


(2) rz* = Q(rgq*) = 2° = Q(q’). 


Homothetic curves have the same slope for equal output ratios, as is 
apparent from their property of homogeneity. 

If g’ and gq” lie on homothetic constant-product functions, an index 
of output for II relative to I may be defined as 


(3) r= Q(9")/Q(). 
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If the transformation function is not homothetic, the ratio (3) will 
change with the ratio of output, s. For any chosen s, however, we can 
make a percentage change in outputs correspond to the same percent- 
age change in z. Using the ratios defined by q’ for this purpose, we 
take advantage of the ordinal property of the function (1) to transform 
it so that 
(4) 7Q*(q') = Q(r'q’) = Q*(q") 
where r/ is the index of production. In terms of Figure 1, we have 
chosen to measure expansion in output in terms of the percentage 
changes of outputs on the line [0—r’q’]. 


Equally reasonably, we can use the ratios of output of II for measur- 
ing purposes. We transform the function (1) so that 


(5) Q**(q") = [1/r!][Q**(q"7) | = Q**(q!/r!2). 


Other choices of definition are possible. We could take a mean of 
values of r over a set S of output ratios. Or we could consider sets of 
goods which would have been produced if one set of economic forces, 
such as demand, had remained constant, while another set, such as 
technology, had changed. Attention will be confined to the simpler 
indexes r! and r”/” 


3. USE OF PRICE DATA 


In the absence of direct knowledge about the shape of the trans- 
formation function, it is possible to infer some knowledge from price 
information. Consider two products, qi and g2, produced by different 
firms, and selling at prices p; and po. The goods are produced with the 
factors of production a;, i=1, - - - , m selling for prices b;. For the 
first firm, we assume the following functions: 


(6) @: = a (a::;), production function possessing partial derivatives. 


(7) Pi = 71(q:), demand for qu. 
(8) dis = biilais), 7 = 1, +--+ , m; supply functions for factors. 


(9) t = tau, bi, p1, q1), tax function. 


A similar set of functions is assumed for qe. 
Necessary conditions for maximum profits for firm 1 are: 


bq: 6p16q1 6bi; 64; 
10 a= + — bi — ay - 
(10) “ 6a; ° 6q1601; : ; 6a; 601; 
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We are interested in the marginal rate of substitution of q: and q. 
The condition necessary in order that dqi/dq2= p2/pi, when da;;= —day,, 
i=1, +--+, m, and the conditions (10) are fulfilled is 

d(biiaii) + dt, d (beidei) + dts 

(11) = — 

1 1 

1-— 1-— 

m1 Ne 
where and 72 are elasticities of demand, d(ba) represents the cost 
of the factors da;, and dt is the tax change caused by the shift of factors, 
When monopoly and tax influences are the same in the sense of (11), 
the ratios of prices equal the ratios of marginal rates of substitution 
of qi and qe. 

If information on the monopolistic influences or tax is available, 
prices may properly be adjusted to allow for such disturbances. Such 
adjusted prices will better measure marginal rates of substitution and 
will be more appropriate for use in index number construction. Allow- 
ance may also be made for dynamic disturbance to price, such as 
results from unforeseen changes in demand. Some problems arise, 
however, in deciding just which dynamic influences should be ab- 
stracted and which should not. These will be touched on below in 
section 5. 


4. COMPUTATION OF THE INDEX WITH PRICE DATA 








Assuming that the marginal rates of substitution in production 
equal the ratios of prices in each time period, we have for each equi- 
librium point 


(12) Qi(q)/pi = d, 


where J is constant and Q; is the partial derivative of Q(q) with respect 
to q:. Using the definition (4) of an output index and expanding by 
Taylor’s Theorem, we have 


(13) Q(rig’) = Qa") + Di Qa" )(r'git — gt!) + Ri 


where the remainder is 


(14) Ri = 1/2!) Qis(g*)(qi* — a!) (Q* — a7), ail! S gi* S$ rail. 
ry 


Noting that the values of Q are the same, and substituting from (12), 
we have, upon solving for r’: 


5 ) pilgd R, 


3 Sie > pig? = rD pig 








TRANSFORMATION FUNCTIONS AND OUTPUT INDEXES 229 


Dropping the remainder, we obtain the Paasche quantity formula, 
derived by using a linear approximation of the product curve II and 
choosing the proportions of commodities of period I for comparison 
purposes. In terms of Figure 1, we have determined a point A as an 
approximation of rq’. 

The remainder can be interpreted in terms of the marginal rates 
of substitution at the point where the remainder is evaluated. If 
diminishing marginal rates of substitution predominate, the remainder 
is positive. A diminishing-rate of substitution for two goods exists 
if Qi:>0 and Q;;<0. Since the set rg,’ and the set gq,’ represent the 
same production level, some rq; <q,// and some rq7>q1'". It follows that 
many of the terms in the remainder are positive if diminishing mar- 
ginal rates of substitution exist, namely the terms 7=j, and the terms , 
for which (q:i*—q,! ) and (q;*—q,/) have opposite signs. Other terms 
may be negative, but these need not exist, and the notion that di- 
minishing substitutions prevail implies that they are outweighed by 
the positive terms. In the two-variable case R,>0. Conversely, if 
increasing rates of substitution are predominant at the point where 
the remainder is evaluated, R;<0. This latter case is illustrated by 
point A in Figure 1. 

If R,>0, the Paasche index is too large and gives an upper bound 
when comparison is made by proportions of period I. If R,<0, the 
Paasche index is too small. 

If we use the ratios of commodities of period II for our comparison 
of outputs, we adopt the definition (5) as our index, obtaining 


ae po pig! rR, 


~~ D> plait n>, pig 


where R, is the remainder and uz is a positive constant. The Laspeyres 
quantity formula is a linear approximation to the period I product 
curve, using period I ratios of commodities. If diminishing rates of 
substitution prevail, the Laspeyres index is too small, giving a lower 
bound, while if increasing returns predominate, the Laspeyres index is 
too large, yielding an upper bound. 

The two indexes r! and r” are not generally equal. Hence we cannot 
combine the upper and lower bounds of the two indexes to obtain 
limits for the indexes. If the function Q is homothetic over the range of 
quantities involved, the two indexes must be equal, and the index 
will lie between the Paasche and Laspeyres indexes. If the Paasche 
index is smaller than the Laspeyres index, increasing marginal rates 


(16) 
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of substitution prevail, and conversely. Regardless of which of the 
two is smaller, an average of the Paasche and Laspeyres index will 
probably give a better approximation than either limit. Indeed a geo- 
metric mean of the two indexes, which is the Fisher “ideal” formula, 
is the correct index in the quadratic case, and since a quadratic ap- 
proximation to the function is better generally than a linear one, this 
average will usually be better than the Laspeyres or Paasche index in 
the homothetic case. 

These derivations have not dealt with the case of index numbers 
with price weights chosen, not from the base or the given year period, 
but from some third period. This possibility is one in which the linear 
approximations of the product curves are achieved with a different set 
of prices. No new principle is involved, and the procedure can be justi- 
fied where the approximation can be justified, as in the cases con- 
sidered. 


5. INTERPRETATION OF THE TRANSFORMATION FUNCTION 


The transformation function is an ordered set of constant product 
functions, each of which may refer to very different conditions of quan- 
tities of resources, technology, and the like. The transformation func- 
tion may be far from homothetic, and indeed the constant product 
functions might intersect (cf. Kuznets [7], p. 116), rendering the order- 
ing more or less arbitrary. When two constant product curves are far 
from homothetic, the two indexes r! and r! differ considerably, if the 
variations in output ratios are also considerable. Where this occurs, 
the difference warns us that the productive mechanism has so changed 
that comparisons based upon it are not precise. 

Changes in technology are perhaps the most likely cause of such 
non-comparability. If technology changes for certain goods much more 
than for others, the marginal rates of substitution are altered. The same 
can be said for natural resource exhaustion or discovery, and changes 
in the terms of foreign trade. If some goods are affected more than 
others, the marginal rates of substitution are changed, and the con- 
stant product curves will not be homothetic. 

Difficulties arising from the specific nature of capital goods and 
labor skills may not be so unescapable. The difficulties are that spe- 
cific sets of capital goods and labor skills employed in a particular 
period may have sharply diminishing returns in certain directions of 
substitution. This fact itself does not render the constant product 
functions non-homothetic, but if the two periods involved have dif- 
ferent kinds and quantities of capital goods, the diminishing returns are 
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likely to occur at different output ratios, and this will make the func- 
tions non-homothetic. To avoid these difficulties, we may introduce 
some type of transformation functions for these items. We are searching 
for production equivalents of final outputs, and it does not seem 
unreasonable to allow the specific form of inputs to vary as well. The 
following procedure is suggested as a possibility. 

a. Select from the economy of a period (say I) a set of firms and labor 
representing capital goods industries. Call this C. 

b. Determine the minimum time 7'c¢ that it would take C to con- 
struct capital equipment as efficient as the actual equipment in pro- 
ducing g’ when used with the actual natural resources, technology, 
and labor of I. 

c. Select a set of training schools from the economy of I. Call 
this W. 

d. Determine the minimum time Tw which would be required for W 
to produce skills as efficient as those of I when used with the capital, 
natural resources, and technology of I to produce q’. 

e. For any ratio of outputs, s, determine an output equivalent to 
q' as the amounts q;=ks; (k constant) possible with the natural re- 
sources and technology of I, and the best set of capital goods and labor 
skills that could be produced with C and W in the times 7'¢ and Tw. 

In addition to rendering capital goods and labor skills more flexible, 
this construction is designed to avoid a complication arising from dy- 
namic factors. The productive resources at any time include mistakes, 
that is capital and human resources which have been produced at a 
greater cost than they prove to be worth. When we consider alterna- 
tive products of such resources, we should consider not the physical 
alternative, but something less than that. To cite an extreme case, the 
alternative product of a ghost city should be counted as zero. If we 
use the notion of most efficient forms (determined by minimum con- 
struction time), our alternative products cannot utilize the mid- 
directed energies of the past. 

A similar problem arises in the transformation function itself, if 
experience and development are considered as productive factors. In 
the application of known knowledge to a new purpose, some experience 
is required for maximum efficiency. If a given situation is not in equi- 
librium for this reason, the alternative product of the output of the 
period should make allowance for this lack, if we choose to consider 
experience and development as a productive factor. This problem 
arises in comparing war output with peacetime output. Peacetime 
output usually has the benefit of much more experience and develop- 
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ment than does war output, and if we refuse to recognize this pro- 
ductive factor, war output has a smaller peacetime equivalent than 
if we do. It is perhaps impossible to say which is the correct pro- 
cedure, since different meanings are possible for the transformation 
function, and for the indexes of output based upon it. However, it 
seems difficult to deny the importance or relevance of experience and 
development. Dynamic influences upon production which are the re- 
sult of the lack of economic foresight should be abstracted from a 
measure of production based on output possibilities. Dynamic in- 
fluences such as experience, which are part of the production process, 
might very properly be allowed to influence production measures. 
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SAMPLING 1949 CORPORATION INCOME TAX RETURNS 
A. C. Rosanper,' R. H. Biyrue, Jr.,2 anp D. W. JoHNsON* 


THE PROBLEM 


HE Bureau of Internal Revenue and the Federal Trade Commission 
, both interested in corporation data, the former for purposes 
of taxation, the latter for financial reporting. In 1949 there were about 
600,000 active corporations filing income tax returns with the Bureau. 
To tabulate all of these returns annually is a large undertaking; to 
audit ali of them is an even larger undertaking. Hence these agencies 
have turned their attention more and more to sampling as a solution 
to the problem of quickly getting useful operating and research data 
on corporations. 

The Bureau of Internal Revenue has used probability sampling in 
connection with tabulations of individual income tax returns and 
partnership returns of income for some time. It has used sampling of 
corporations only for special studies and not as a continuous policy. 
It was decided to draw a sample of corporation income tax returns for 
1949 for three reasons: a sample of the corporations with assets under 
$250,000 could be used as the basis of an audit program to throw light 
on an area of tax audit about which the Bureau has incomplete in- 
formation; a sample could be used to test the feasibility of shifting 
the tabulation of corporation statistics, now reported in Statistics of 
Income, Part 2, from a tabulation based on all returns, as at present, 
to a tabulation based upon a designed probability sample; a sample 
could be used as the basis of periodic special studies nei the Bureau 
is called upon to make. 

The Federal Trade Commission, in connection with the Securities 
and Exchange Commission, publishes the Quarterly Industrial Finan- 
cial Report based upon a sample of corporations engaged in manufac- 
turing in 1943. It was highly desirable that this sample be revised and 
drawn from the most current and complete list of corporations avail- 
able. It was important also that the total assets and industrial classi- 
fication of every corporation be known so that these two factors could 
be used for stratification of the population. These requirements pointed 
to the income tax returns of the Bureau of Internal Revenue as the 
most logical frame for the sample. An executive order was obtained 
giving the Federal Trade Commission access to the income tax returns; 





' Bureau of Internal Revenue. 
? Federal Trade Commission. 
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the purpose was not to examine the return in detail but to obtain a 
list of names and addresses and a few basic codes and figures for 
sampling purposes only. 

The problem then became one of devising a scheme for drawing two 
different samples which would contain no duplicates from a population 
of approximately 600,000 income tax returns. There are two physical 
characteristics of this population of tax returns which are controlling 
factors in a sampling scheme. First, the returns are not physically 
located in a static arrangement of file drawers until several years after 
they have been originally filed by the corporations, because large 
numbers of returns are being used in current operations. However, 
they all flow through a series of processing steps within the Washing- 
ton office of the Bureau of Internal Revenue during the year following 
their filing. Second, the flow of the returns through the Bureau is in 
the form of bundles of from one to a hundred returns, bundles which in 
general must remain intact, and therefore cannot be rearranged into 
strata before sampling. In general the only characteristic common to 
all returns in a bundle is the collection district in which they were orig- 
inally filed. The bundles contain a mixture of asset sizes and indus- 
tries. 

In addition to the problems posed by the physical characteristics 
of the flow of returns, there were serious limitations on available 
space and personnel, as well as the necessity for a scheme which would 
not become a bottle-neck in the over-all processing operations. 


DESIGN OF THE SAMPLE 


The reasoning which led to the selection of the particular sample 
size and design need not be developed here because it contains nothing 
new or unusual; however, an outline of the final design follows. The 
samples for both agencies were of the stratified, random type. The 
basis for stratification was multiple, namely, assets size and industrial 
classification. There were eleven asset classes and 41 industry groups. 
The industry groups for the most part were the Bureau of Internal 
Revenue two-digit classes which correspond very closely to the 
standard industrial classification; in some instances a number of 
two-digit groups were combined and in a few other cases three-digit 
industries were used. The allocation of the sample to the asset size 
groups was proportional to the product of the number of corporations 





* A collection district corresponds to a State or Territory, except in 8 States which have more than 
one collection district, and except in the case of Alaska which is part of the State of Washington col- 
lection district. 





1949 CORPORATION INCOME TAX 235 


in the population in that group and the standard deviation of the 
income before taxes. This is the usual form of allocation in order to 
attain the minimum sampling error in the estimate of total income 
before taxes. Both the number of corporations and the standard devi- 
ation in each stratum were preliminary estimates based on data from 
previous years. The allocation to the industry groups was simply in 
proportion to the number of companies in the industry; the same 
ratio was applied to all industries in an asset stratum. The justification 
for this was the assumption, borne out in previous years, that the 
standard deviations in the different industry groups of the same assets 
class would be equal. 

The application of these design principles yielded the sample plan 
shown in Table 1. 


TABLE 1 


ESTIMATED CORPORATION POPULATION* AND ALLOCATION 
OF CORPORATION SAMPLES— 1949 








Class limits Estimated BIR sample FTC sample 
(Thousands of ee 
dollars) population Rate Number 
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50- 100 
100- 250 
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500-1 ,000 
1 ,000-5 ,000 
5,000 and over 


S88sssss 
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2 
1 
2 
3 
4 250- 500 
5 
6 
1 


me 
hd 





Total 15,000 


g 
g 





* Active corporations filing a corporation income tax return. 
+ Returns with incomplete balance sheets, or with no balance sheets. 


From this table it is apparent that the B.I.R. sample consisted of 
approximately 15,000 corporations restricted to those with assets of 
less than $250,000, and to those with incomplete or no balance sheet 
data. The F.T.C. sample, on the other hand, covered all size classes 
and totaled about 64,000 companies. 

Since there were 41 industry groups, the table represents 7X41 
or 287 different strata from which samples had to be drawn. (Corpora- 
tions with assets of $5,000,000 and over are not regarded as a sampling 
stratum, since all of them were included in the sample.) 

The problem of actually selecting the sample was complicated be- 
cause all of the tax returns were not separable into strata before the 
sampling began, and because the true population numbers in each 
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stratum were not known but had to be developed as part of the sam- 
pling process. Thus it was necessary to maintain a running count of 
the total number of returns in each of 287 strata and to select the 
sample returns at the proper uniformly spaced intervals. A systematic, 
uniform sampling interval was used as a practical approximation to » 
truly random selection. This tends to give a small degree of automatic 
stratification on a geographic basis, because of the fact that each 
bundle consists of returns from the same collector’s office. But it is 
effectively random with respect to asset size and industry because the 
bundles are composed of mixtures of tax returns by sizes and industries. 


THE SOLUTION OF THE SAMPLING PROBLEM 


The problem of counting all returns and identifying every i-th 
return in 287 strata was solved by using decks of punch cards which 
were racked in specially built boxes to be described later. A separate 
deck of punch cards was prepared for each of the 287 strata. The 
cards were punched with four codes as follows: (1) the asset size code; 
(2) the B.I.R. industry code, which was generally a two-digit code but 
sometimes was a three-digit or else a combination of two or more two- 
digit codes; (3) a simple two-digit industry code used for machine 
sorting and tabulation control; and (4) a sample code to indicate 
whether that card represented a non-sample tax return, a sample for 
the B.I.R., or a sample for the F.T.C. In addition, the sample return 
was indicated by a color scheme; white cards were used for non-sample 
cases, blue cards for the F.T.C. sample returns, and red for the B.1.R. 
sample. The cards were run through an IBM interpreter which printed 
the punched code at the top of each card. This readable information 
was used in the verification step described later. The colored cards 
were interspersed with the white ones at evenly spaced intervals cor- 
responding to the proper sampling ratios, so that a return would never 
fall in both samples. 

The decks of punch cards were then racked in the special boxes. 
One box was built for each of the seven asset classes which had to be 
sampled. The over-all dimensions were about 25 inches high, and 21 
inches wide, with a sloping front. The box was divided into 45 cells or 
pigeon holes, the inside dimensions of each cell being planned to ac- 
commodate a deck of about 400 punch cards. The arrangement was 
5 cells wide by 9 cells high. 

A separate box was set up for each of the seven asset classes. A 
pigeon hole was assigned to each of the 41 industry groups. The same 
arrangement of industries was used in each box. The industry code 
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number was indicated below each cell. When the boxes were loaded 
with the decks of cards placed in the proper cells, sampling was ready 
to begin. 

OPERATION OF THE SAMPLING SYSTEM 


In the processing steps occurring before the sampling procedure, 
the tax returns were coded for asset size and industry, and the code 
numbers written in a specified position on the face of the tax return. 
In addition, the returns in each bundle were put in order according to 
asset size, i.e., asset 1 returns were on top of the bundle, asset 2 returns 
were next and so on. This was the only ordering within the bundle and 
no returns were ever interchanged between bundles. Since a bundle 
might contain from one to a hundred tax returns, there was no assur- 
ance that all asset classes would be represented in every bundle. The 
number in any class varied greatly from bundle to bundle. 

In actual operation of the system, a bundle of tax returns was turned 
over first to the person responsible for the asset 1 box. He read the 
asset code on the top return of the bundle and if it was asset 1, he 
then read the industry code and withdrew from the pigeon hole labeled 
with that industry code the punch card on top of that deck. He then 
added a code to the face of the tax return to indicate the sample 
status of that return. If the punch card he had drawn was white, a 
code number indicating a non-sample corporation was written in a 
standard position on the tax return. If the punch card was red a dif- 
ferent number was used to indicate a B.I.R. sample return; while if 
the punch card was blue a third code was used to indicate an F.T.C. 
sample. 

The punch cards withdrawn from the boxes were stacked in order 
in an empty pigeon hole. When the operator of box 1 withdrew a 
punch card to correspond with every asset class 1 return in the bundle, 
he attached the selected punch cards to the top of the tax return bundle 
and turned the pile over to the operator of the next box, that for asset 
code 2. This person proceeded in exactly the same fashion for all asset 2 
companies in the bundle. In this way, the bundle passed in turn to 
each of the boxes; if the bundle did not contain any returns for a par- 
ticular asset class, it was moved to the next box. Finally the bundle 
reached the verification stage. At this point, the verifier made three 
checks as follows: 

1. Had a punch card been drawn for every return in the bundle? 

2. Did the asset and industry codes on the interpreted punch card 

agree with those on the tax return it represented? 
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3. Did the sampling code put on the tax return agree with the color 
of the punch card? 


This check was made possible by maintaining the tax returns and 
the punch cards in the same order in which they were drawn. Errors 
were immediately called to the attention of the individuals who made 
them. 

After verification, the verifier separated physically those returns 
coded as B.I.R. samples and prepared special routing directions for 
them. He also put the F.T.C. sample returns on the top of the remain- 
ing pile and turned the bundle over to an F.T.C. worker who copied 
onto special cards the name, address and certain numerical data for 
the F.T.C. sample cases. 


CONTROL OF THE SAMPLING PROCESS 


At the end of each day’s work, the punch cards withdrawn during 
the day were picked up and taken to the machine tabulation unit. 
Here they were sorted and tabulated by asset class and industry. The 
tabulation showed the total number of returns drawn in each industry 
and asset class, the number of returns drawn into the B.I.R. sample, 
and the number of returns drawn into the F.T.C. sample. Stimultane- 
ously with the tabulation a set of summary cards were punched which 
then were tabulated to produce a cumulative count to date and at 
the end of the job. 

The cards were then rearranged into the decks, and the colored 
cards inserted at the proper sampling intervals. This insertion of the 
colored cards was the most time-consuming part of the control process. 
It could have been done mechanically by the card-counting device on 
the collator, but generally the decks were too small to make this ef- 
ficient. 

These daily counts and periodic cumulative summaries also per- 
mitted a check of the sampling ratios for each asset class to make sure 
that everything was going according to plan. 


MANPOWER AND PRODUCTION 


_ For convenience of other processing steps, the tax returns were 
divided into two groups before they reached the sampling point: asset 
classes 1, 2, 3, 4 and 12 were put in one group, and asset classes 5 and 
over were put in the other. Good production data are available for the 
former group. A crew of five could average better than 3,000 tax re- 
turns per day when the flow of returns from prior processing steps was 
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sufficient to sustain this rate. The crew consisted of one person handling 
asset class 1, one person for classes 2 and 3, a third for classes 4 and 12, 
one verifier, and the fifth person transcribing data for the F.T.C. 
sample cases. The assignment of more than one asset class to a person 
was done to even the work load as the number of returns, according to 
Table 1, was not equal for all classes. However, most of the time a 
crew of 3 or 4 was used because the preceding operation did not pro- 
duce as many as 3,000 returns per day. 

If necessary, a greater rate of production could be attained by as- 
signing only one asset class to a person. However, this will result in 
some lost time by those workers handling the classes with the smaller 
numbers of returns, although this might be alleviated by a system of 
shifting assignments or by multiple boxes of cards for the classes with 
the greater numbers of returns. 


ERRORS 


One verifier was able to verify all the work of all the persons working 
at the sampling boxes and, therefore, one hundred per cent verification 
was used, although a sampling procedure could probably have been 
used. 

Three kinds of errors could be made at the sampling boxes: no card 
could be drawn, a card could be drawn from the wrong industry 
(wrong pigeon hole) or from the wrong asset class (wrong box), or 
the wrong sample code could be entered on the return. These are the 
kinds of errors included in Table 2. The only other kind of error en- 
countered was that due to wrong classification in the first instance, 
such as the wrong tax year, wrong industry code, or wrong asset size, 
and although these cases were rare, they were corrected when found. 


TABLE 2 
NUMBER OF ERRORS PER 1,000 TAX RETURNS MADE IN SELECTING THE SAMPLE 








Weeks Weighted 
Days daily 
4 5 8 averages 








Monday 

Tuesday 

Wednesday 

Thursday 

Friday 

Weighted weekly 
averages 8.1 
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Table 2 shows the average number of errors per 1,000 returns for 
each of 5 days for a period of eight weeks. The weeks are not always 
consecutive due to shut-downs to allow prior processing stages to 
catch up. 

The differences between weeks are significant, but this is largely a 
reflection of the high error rates for the first two weeks which were 
still a part of the learning period. After the fourth week the error rate 
stabilized at about 5 errors per thousand returns. 

The daily averages do not deviate significantly from the grand mean 
of 5.4, but they include not only day by day variations as such, but 
also the increases in error rates due to breaks in the work. The highest 
values for Monday and Wednesday appear to be due largely to this 
factor. That a break in the sampling work is a factor in causing error 
is indicated by a comparison of the error rate made on the last day of 
a working period with that made on the first day of the next working 
period when an interval of more than tio days (not counting week- 
ends) intervened. 


TABLE 3 
NUMBER OF ERRORS PER 1,000 TAX RETURNS 








Last day before First day after Difference (first day 
shut-down shut-down less last day) 








Not all of this increase can be charged to shut-down; some of it is 
due to the occasional use of new and less experienced personnel when 
sampling was resumed because previous sampling workers had been 
assigned to other jobs. These data suggest that intermittent work 
tends to increase the error rate. 


FLEXIBILITY 


This system of sampling has great flexibility. For example, different 
sampling ratios can be used within the industry classes and this was 
done in several instances. For example in the apparel, lumber, printing 
and leather goods industries, the preponderance of small companies 
makes it necessary to draw a larger sample of small companies in 
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order to yield estimates for these industries with reasonable sampling 
errors. By merely inserting the required larger number of colored 
punch cards in the deck, these higher ratios can be achieved without 
the necessity of preparing special instructions. It should be pointed 
out that this variation in rates does complicate the preparation of the 
decks of cards by the machine tabulation unit, and should be avoided 
unless there is a real advantage. 

The system can also be easily adjusted to take care of a varying 
volume of work. If the rate of flow of returns is low, one or two per- 
sons only can operate by shifting successively from one asset class to 
another. 

This system appears feasible for sampling from a large finite popu- 
lation which is not readily accessible, but all elements of which move 
through a common set of operations within which at some point 
sampling can be carried out. 


SUMMARY 


A sampling system which was jointly operated by two cooperating 
government agencies has been described. This system fulfilled the 
needs of both agencies and met the administrative restrictions of 
each. It was designed to permit the simultaneous drawing of two non- 
duplicated samples from a flow of about 600,000 corporation income 


tax returns, and extended over a period of about a year. The sample 
was a stratified random design with 287 strata. The total number of 
tax returns in each stratum had to be found as a part of the sampling 
process. 

A feature of this system was the use of decks of standard size punch 
cards as tally cards, sample indicators, and control records. This system 
combined probability sampling design with efficient administrative 
control and functioned smoothly as a part of the regular procedure of 
an operating agercy. It was characterized by a relatively high rate of 
output with a low rate of error, and resulted in considerable saving 
in total overhead cost. 








PROCEEDINGS 


AMERICAN STATISTICAL ASSOCIATION 
110TH ANNUAL MEETING 


CONGRESS HOTEL, CHICAGO, ILLINOIS 
DECEMBER 27, 1950 


MINUTES OF THE ANNUAL BUSINESS MEETING 


The meeting was presided over by the retiring President, Samuel S. Wilks. 

Morris A. Copeland, Chairman of the 1950 Committee on Fellows, announced 
the election to Fellowship by his Committee of R. G. D. Allen, Harold Cramer, 
Harold F. Dorn, A. Ross Eckler, Raymond J. Jessen, A. D. H. Kaplan, Maurice 
G. Kendall, Alexander M. Mood, Oskar Morgenstern, Paul L. Olmstead, Mor- 
timer Spiegelman, P. V. Sukhatme, L. H. C. Tippett, John Wishart and H. C. 
Wold. (The full report of the 1950 Committee on Fellows was published in 
the February 1951 issue of The American Statistician.) 

The Secretary-Treasurer, Samuel Weiss, read the reports of the Secretary- 
Treasurer which are printed separately in this issue of the Journal. 

Frederick F. Stephan, one of the retiring Directors of the Association, read 
the report of the Board of Directors for the year 1950 which is published in this 
issue of the Journal in full. 

Lowell J. Reed reported for the Committee on Elections that the following 
officers had been elected by the membership on the ballot distributed in Novem- 
ber 1950: Aryness Joy Wickens, Deputy Commission of Labor, Bureau of Labor 
Statistics, as President-elect; Morris H. Hansen, Assistant Director for Statistical 
Standards, Bureau of the Census as Vice-President; John W. Tukey, Associate 
Professor of Mathematics at Princeton, and Ralph J. Watkins, Director of 
Research, Dun and Bradstreet, Inc., as Directors. Waite S. Brush, Consolidated 
Edison Company of New York, Inc., Jerome Cornfield, National Cancer Insti- 
tute, U. S. Public Health Service, Paul G. Hoel, Professor of Mathematics, 
University of California at Los Angeles, Palmer O. Johnson, Professor of Edu- 
cation, College of Education, University of Minnesota, Frederick Mosteller, 
Laboratory of Social Relations, Harvard University, J. R. Stockton, Professor 
of Business Statistics and Director of Business Research, University of Texas, 
as District Representatives to the Council. 

After the Report of the Committee on Elections, Lowell J. Reed, the President 
for 1951, assumed the chair and gave the floor to Professor Wilks who read his 
Presidential Address (published in full in the March 1951 issue of this Journal). 

Harold Hotelling, at the request of the Committee on Resolutions read the 
following Resolution which he had drafted in honor of Abraham Wald. 

WHEREAS the death of Professor Abraham Wald, who with Mrs. Wald 
was killed in an airplane crash in India, deprives statistics of a vigorous, 
brilliant, and original contributor to its fundamental ideas; and 

WHEREAS the future of statistical method will be vitally affected by 

Abraham Wald’s introduction of a formalized and accurate method of se- 

quential analysis, and by his work on the foundations of statistical inference, 

including particularly the consideration of loss and risk functions, of general 
decision problems, of the mimimax principle and the related theory of games, 


242 
























































ced 
ner, 
rice 
for- 


pad 
his 


ing 
m- 
bor 
cal 
ate 


ted 
sti- 
cs, 
lu- 
er, 
or 
as, 


nt 
his 


he 








PROCEEDINGS OF THE 110TH ANNUAL MEETING 243 


of the nature of the estimation of unknown quantities, and of the testing of 
hypotheses; and 

WHEREAS the efforts of American industry and the military and naval 
services of supply were materially aided in the successful conduct of the 
Second World War by widespread application of Abraham Wald’s work, par- 
ticularly to the quality control of manufactured articles; and 

WHEREAS his contributions to statistical methods and theory were sub- 
stantial in such varied fields as the foundations of probability, inequalities 
on distributions in terms of moments, the treatment of time series, long cycles 
resulting from repeated integration, tolerance limits, analysis of variance, 
asymptotic large-sample distributions, and the estimation of parameters of 
stochastic processes; and 

WHEREAS Abraham Wald contributed also to economics and economic 
statistics by his penetrating studies of equations of production and of general 
equilibrium, of index numbers of cost of living, and by the determination of 
indifference loci by means of Engel curves; and 

WHEREAS he made in his earlier career in Europe valuable contributions 
to pure mathematics in the fields of differential geometry and the aximatiza- 
tion of metric spaces; and 

WHEREAS he served the American Statistical Association as Vice Presi- 
dent and the Institute of Mathematical Statistics as President and as member 
of its Council and of the Editorial Board of the Annals of Mathematical Sta- 
tistics; and 

WHEREAS great inspiration is to be derived from the example of Abraham 
Wald in his surmounting of the difficulties caused by the discrimination and 
restrictions that, in his East European environment, denied him the oppor- 
tunities of the primary and secondary schools; in his entrance to the university 
in his native city of Klausenburg by examinations for which he had prepared 
himself; in his graduation with distinction and his brilliant graduate work at 
the University of Vienna; in his migration to this country at the time of the 
fall of Austria; in his fortitude in enduring the loss of his nearest relatives by 
the Nazi policy of genocide; in his devotion to our science and in his habits 
of hard work which brought a great volume of substantial contributions; and 
in his ability to be friendly and kind under the severest strains; now therefore 

Be it resolved that the American Statistical Association and the In- 
stitute of Mathematical Statistics jointly record their deepest sorrow 
and regret at the untimely passing away in middle life of a great con- 
tributor, and at the further tragedy that his wife also was taken; and that 
this Association extends its sincere sympathy to the bereaved relatives and 
particularly to the two young children who remain. 


Philip M. Hauser, reporting for the Committee on Resolutions, proposed the 


following Resolutions which were voted unanimously by the membership in 
attendance. 


WHEREAS William G. Cochran of Johns Hopkins University, Baltimore, 
has served the Association in the capacity of Editor of the Journal of the 
American Statistical Association from June 1945 to December 1950, and has 
now retired from this position at the end of his term; 

THEREFORE be it resolved that the members of the Association in an- 
nual meeting assembled do hereby express great appreciation of the out- 
standing contributions that have been made by Professor Cochran during 
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his term of office, and hereby express our recognition of his valuable service 
to the Association, so ably performed over this long term of years. 

RESOLVED that the Officers and Members of the American Statistica] 
Association express their profound appreciation to the Local Arrangements 
Committee and to all of the individual members of the Chicago Chapter for 
their outstanding work and hospitality in connection with the arrangements 
for the 110th Annual Meeting of the Association. 

RESOLVED that the Officers and Members of the American Statistical 
Association express deep appreciation for the excellent program prepared by 
the members of the Program Committee under the chairmanship of Mortimer 
Spiegelman. 


Alfred N. Watson, Chairman of the Committee on Public Relations, reported 
on the activities of his committee during 1950. 

Walter E. Hoadley, Chairman of the Committee on Institutional’ Membership 
for 1950 reported on the success of the drive for institutional membership. (All 
committee reports will appear in The American Stctistician during the course 
of 1951.) 

The meeting was adjourned. 


Report of the Board of Directors 


During 1950, the Board of Directors was gratified to find that as a result of 
very careful operation it was possible to show a net operating income of over 
$6,000, thus ending the year with a net surplus of approximately $3,000, com- 
pletely wiping out the deficit which developed during 1947 and 1948. As a result 
of this easing of the financial situation, it was possible to turn more active atten- 
tion to the Program of the Association and a number of projects are now success- 
fully underway. There has been very widespread membership participation in 
the activities of our committees and regional and chapter meetings. The Associa- 
tion has extended its influence greatly and has received a number of requests 
from private and government agencies for the appointment of advisory com- 
mittees to assist in the planning of statistical program. 


The Program Committee 


The Program Committee for the 1950 Annual Meeting under the Chairman- 
ship of Mortimer Spiegelman of the Metropolitan Life Insurance Company has 
prepared one of the most excellent Annual Meetings ever held by the Association. 
Joint meetings have been held with 15 other Associations representing econo- 
mists, marketing analysts, statistical quality control engineers, mathematicians, 
psychologists, psychometricians, biologists and public health statisticians. 

The Biometrics Section, as well as the new Sections on the “Training of 
Statisticians” and “Business and Economic Statistics” have participated in 
planning a substantial portion of the Program of the December Annual Meeting 
and are now engaged in working out charters for additional activities to be 
undertaken during the coming year. 

Sylvia Weyl] prepared a “Guide for Speakers and Chairmen at the Annual 
Meeting” for the Program Committee which has been distributed to all the 
participants of this year’s meetings. There has been enthusiastic response to the 
“Guide” and a number of requests have come in to the Secretary’s office from 
members wishing to deliver papers at meetings of other societies. 

At the recommendation of the Board of Directors, W. Edwards Deming ex- 
plored the matter of the formation of a Committee on Statistics in the Physical 
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Sciences and Philip M. Hauser explored the possibilities for a Committee on 
Statistics in the Social Sciences. As a result of their discussions and recommenda- 
tions, committees in both of these areas have been established and will participate 
in the planning of the 1951 Annual Meetings. It is confidently expected that 
the Committee on Statistics in the Physical Sciences will be able to plan a number 
of joint programs with physicists, astronomers, engineers and other groups which 
have an interest in statistics. The Committee on Statistics in the Social Sciences 
will be working closely with the American Sociological Society Committee on 
Statistics which is specifically charged with the responsibility of cooperating with 
the ASA in the area of statistics. 


Editorship of the Journal 


At the last annual meeting of the Association the Council elected W. Allen 
Wallis of the University of Chicago to succeed W. G. Cochran as Editor of the 
Journal. Wallis assumed full responsibility for the Book Review Section and 
began to review all new manuscripts early in the year. The September 1950 
issue was the first in which published book reviews were under his editorship 
and the March 1951 issue will be the first for which he bears full editorial re- 
sponsibility. The Editorial Board has been organized with a view to transferring 
a large portion of the editorial burden to the Associate Editors. Lester 8. Kellogg, 
Howard L. Jones and William G. Madow have been appointed Associate Editors 
for the terms expiring in December of 1951. Albert H. Bowker, Solomon Fabri- 
cant and Frederick Mosteller have been appointed for the terms expiring in 
December of 1952. In addition, all living former Editors of the Journal have 
agreed to serve on an advisory panel which can be consulted from time to time 
on matters of general editorial policy. A fairly large group of Editorial Col- 
laborators has been appointed, each to serve for a year; they will do the basic 
evaluation and criticism of manuscripts, and have other important functions. 


Pamphlet about the Association 


The Secretary’s Office, with the assistance of President Wilks and several 
other Officers has prepared a pamphlet concerning the Association—What It Is; 
What It Does and What It Offers. This has been distributed to prospective mem- 
bers during the past few months and is available for distribution by individuals 
or Chapters wishing to participate in a membership recruitment campaign. 


The Bureau of Mines 


The Bureau of Mines of the U. 8. Government has asked the Association to 
establish a project to make a study of the entire statistical system of the Bureau 
of Mines and to make appropriate recommendations for changes and improve- 
ments. A committee to sponsor this project has been established under the 
chairmanship of Raymond T. Bowman and has arranged to employ a full time 
person to direct the str . The project will be entirely the responsibility of the 
Association, but will be nnanced by the Bureau of Mines. 


The Statisticians’ Handbook 


Carrying out the decision of the Council, the Association has concluded an 
agreement with the McGraw-Hill Publishing Company for the publication of a 
“Statisticians’ Handbook.” It is expected that this publication will prove ex- 
tremely valuable to working statisticians throughout the world and will fill a 
need of which the Association has long been aware. Frederick Mosteller, of 
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Harvard University, has agreed to edit the handbook and will be working on 
the project during the next few years. 


The Publication Program 


The Publication Program of the Association has been revived with the pub- 
lication of a monograph on Acceptance Sampling which has sold most success- 
fully. Your Board has voted to establish a regular series of monographs. The 
Committee on Publications is at present considering future titles for this series. 


Abstracting Project for the International Statistical Institute 


At the request of G. Goudswaard, Director of the Permanent Office of the 
International Statistical Institute, the Association agreed to outline a plan for 
the establishment of a statistical methodology abstracting service to be pub- 
lished by the ISI. A small committee under the Chairmanship of Max Wood- 
bury drew up plans for such a project for submission to the ISI. Although this 
project is at present not an official one, it is expected that an abstracting service 
may be developed within the next year or so. 


Regional Meetings 


A Mid-West regional meeting of the Association was held in Ann Arbor in 
June 1950 under the Chairmanship of Donald R. G. Cowan, the senior repre- 
sentative to the Council from the Mid-west District. Plans for a mid-west re- 
gional meeting for the summer of 1951 are already underway under the Chair- 
manship of Howard Jones. The Association also co-sponsored a Conference on 
“Sampling Methods in Business” jointly with the University of Illinois, where 
the Association has an extremely active Chapter. Plans for the co-sponsorship 
of another Conference at the University of Illinois are already underway. 





Committees 


Two new committees have been established this year and have been extremely 
active and successful. The Committee on Public Relations under the Chairman- 
ship of Alfred N. Watson of the Curtis Publishing Co. has been most effective in 
its preliminary public relations work. It has publicized the articles appearing in 
the Journal and The American Statistician. It made available to the local chair- 
man the services of the Committee by assisting him in making arrangements for 
this Annual Meeting, getting in touch with all the participants and arranging to 
get advance copies of the papers so that they would be available for consultation 
by the newspaper men covering the Conferences. 

The Committee on Institutional Memberships under the Chairmanship of 
Walter E. Hoadley is engaged in an active campaign to recruit such members 
among business and industrial concerns. The Association now has 13 Institu- 
tional Members. Several hundred personal letters of invitation have been sent 
out to the officers of business and industrial concerns during the past month or 
so. We are confidently expecting an excellent response to these invitations and 
hope to derive substantial financial support for the Association’s Program in this 
way. 

The National Research Council “Committee on Sex Research Problems,” 
which sponsors the work of the Institute for Sex Research of the University of 
Indiana has asked the Association to establish a subcommittee of the Commis- 
sion on Statistical Standards and Organization to advise the Institute concerning 
its statistical work. The Association has established this sub-committee under 
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the Chairmanship of William G. Cochran of Johns Hopkins University and the 
Committee has already spent considerable time at the University of Indiana 
working with Alfred C. Kinsey, who directs the work of the Institute for Sex 
Research. 

The Board of Directors has recently established a three-member committee 
as a sub-committee to maintain a closer relationship between the Board of 
Directors and the Secretarial Staff of the Association. The Board believes that 
this should result in a much more intimate knowledge by the Board of the 
operating problems and procedures of the Association. 

Many members of the Association—especially those engaged in research in 
economics and social science in government, business and industry are basically 
interested in statistical data. The Board has assumed that the Association has 
some responsibility in making information about such data available to its 
Members and has, therefore, asked Morris Ullman of the U. S. Bureau of the 
Census to make a study concerning the possibility of establishing an abstracting 
service for important statistical data in such a form that it can be published by 
the Journal or The American Statistician. A Committee on Data Sources has 
been established and is presently working on plans for material to be published 
during 1951. 


Biometrics 


The Association has entered into a contractual arrangement with the Bio- 
metric Society by which the ownership of the publication Biometrics was trans- 
ferred to the Society effective as of the first day of March 1950. The Transfer 
Agreement included the provision by which American Statistical Association 
members could continue to receive the publication Biometrics through the ASA at 
subscription rates equal to the amount allocated to the cost of the periodical in 
the dues of American members of the Society. The Society agreed that the title 
page of Biometrics would indicate that the publication was founded by the 
Biometrics Section of the American Statistical Association. It was also agreed 
that the Biometrics Section would be privileged to name one member to the 
Editorial Board of Biometrics to serve for three (3) years from the date of such 
nomination. Finally, it was agreed that if at any time during the period prior to 
the December 1954 issue, the Society should propose to discontinue the publica- 
tion of Biometrics, it would notify the ASA. If the Association should then wish 
to resume all rights and title to Biometrics and continue its publication, it would 
be free to do so. 

Submitted by the Board of Directors: 
SaMvuEL S. Wiuks, President 
Dorortny S. Brapy 
GERTRUDE M. Cox 
W. Epwarps DEMING 
Harowtp A. FREEMAN 
Cyrrit H. GouLDEN 
Puitip M. Hauser 
Simon Kuznets 
LowE.LL J. REED 
FREDERICK F. STEPHAN 
Wiiarp L. THorpe 
Louis L. THURSTONE 
SaMUEL WEIss 
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Report of the Secretary-Treasurer 


I am delighted to report that for the second successive year the Association’s 
income has exceeded its expenses. The 1950 budget as approved by the Council 
provided for a net income of $2,002, while the actual net income for the year was 
over $6,000. It is interesting tc compare this net income with the net loss of over 
$13,000 which occurred during 1948. In two short years the deficit of some $7,700 
which was on the books at the end of 1948 has been eliminated and we will start 
the new year with a surplus of approximately $3,000. Although this surplus isn’t 
too large, as statisticians, you know that oftentimes the trend is more significant 
than the level. 

For the past two years total membership in the American Statistical Associa- 
tion has remained relatively stable. During 1950 the number of new members 
joining the Association approximately equalled the number of those who resigned 
or failed to pay their dues. The National Office has recently initiated what is the 
beginning of a membership campaign by having a promotional brochure printed 
and distributed, and it is believed that the effects of this and other measures will 
be felt during 1951. 

The 1950 membership is composed of the following groups: 


Honorary members 
Fellows 
Regular members* 


Total membership 
Institutional members* 


* Members who joined after October 1, 1950 are considered 1951 members and are not included in 
the above tally. 


The members of the Biometrics Section, at the end of December 1950 num- 
bered 712, of whom 123 were associate members and 589 were also regular mem- 
bers of the Association. 

SAMUEL WEIss 
Secretary-Treasurer 


Report of the Auditors 
To the Board of Directors of 
American Statistical Association 


We have examined the attached financial statements of American Statistical 
Association relating to the year ended December 31, 1950. Our examination was 
made in accordance with generally accepted auditing standards, and accordingly 
included such tests of the accounting records and such other auditing procedures 
as we considered necessary in the circumstances. 

The recorded cash receipts for the year were traced to the deposits shown on 
the bank statements and the amounts for dues and subscriptions were tested with 
the membership and subscription records. The paid checks were inspected and 
related vouchers tested in support of cash disbursements for the year. The bank 
balances were reconciled with amounts reported direct to us by the depositaries 
and the cash on hand and securities owned at December 31, 1950 were verified 
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by inspection. We did not check the membership and subscription records in 
detail or make any independent verification of the inventory of old Journals, the 
office records of which are based, in part, on data assembled in prior years, no 
recent physical inventory having been taken. 

The life membership reserve at December 31, 1950 reflects the amount needed 
to support a life annuity for each life member in the same annual amount as 
that which could have been purchased by the original lump sum payment, based 
on a 24% interest rate, the 1937 Standard Annuity Table and the age of the life 
member when the lump sum payment was made, in accordance with a resolution 
of the Board of Directors adopted pursuant to a mail ballot in January 1949. 
The amount treated as income from life memberships in 1950 represents the ex- 
cess of the reserve at the beginning of the year over the required reserve at the 
end of the year. 

In our opinion, the accompanying statements present fairly the position of 
American Statistical Association at December 31, 1950 and the results of its 
operations for the year, in conformity with generally accepted accounting 
principles applied on a basis consistent with that of the preceding year. 

Snyper, Farr anp CoMPANY 
Washington, D. C. 
April 10, 1951 


AMERICAN STATISTICAL ASSOCIATION 
BALANCE SHEET 
December 31, 
Assets 1950 1949 

Cash in banks and on hand $21,544.82 $10,233.31 
Accounts receivable 1,787 .97 1,721.10 
Investments in United States Savings Bonds: 

Series G, due 1962, at cost 

Series D, at redemption value 
Inventory of ‘old Journals, at approximate cost.... 1,824.83 
Inventory of monograph on Acceptance Sampling, 

785 .54 

Furniture and fixtures, at cost less depreciation... . 2,358 .94 
Deferred charges 1,133.15 


$32,535.25 $20,867.89 











Accounts payable $ 6,248.52 $ 3,499.74 
Deferred income (collections applicable to subse- 
quent year): 
$14,525.50 13,431.25 
4,796.80 3,897 .57 
655 .50 - §3.10 
$19,977.80 $17,381.92 


Life membership reserve $ 2,991.50 $ 3,066.62 
Surplus or (deficit) account, per statement....... 3,317.43 (3,080.39) 


$32,535.25 $20,867.89 




















AMERICAN STATISTICAL ASSOCIATION JOURNAL, JUNE 195; 


AMERICAN STATISTICAL ASSOCIATION 
STATEMENT OF INCOME AND DEFiciT ACCOUNTS 


Year ending December 31, 
Income: 1950 1949 

Dues—current year $32,176.68 $32,366.93 
Dues—prior years 260 .25 
Life membership income 75.12 64.78 
Subscriptions 8,840.71 8,576.69 
Advertising 1,602.27 1,311.80 
Journal sales 1,550.28 2,841.52 
Biometrics sales 304.93 
American Statistician sales 129.15 
Acceptance Sampling monograph sales......... 1,322.60 
Mailing list income 459 .22 
Annual meeting income 
Dividends and interest : 549 .40 
Gain on sale of securities....................4. 219.96 
Miscellaneous 390 .04 


$48,512.67 $46,780.34 








Expenses: 
Journal—printing, mailing and reprints $10,622.43 $10,702.77 
Salaries and wages, less in 1949 $253.32 accrued 
annual leave expense applicable to prior year.. 15,436.81 18,637.81 
American Statistician 4,941.96 3,716.53 
Biometrics Section expenses 750 .00 


Acceptance Sampling monograph—cost of sales. 658 .47 

2,350 .00 2,617.50 
Office supplies, printing and mimeographing. .. . 1,756.92 1,335.65 
Postage 1,616.25 1,186.51 
Telephone and telegraph 743 .49 618.70 
Travel expense—officers 777 .99 661 .08 
Depreciation of furniture and equipment 512.16 487 .36 
Accounting services 720 .00 300 .00 
Storage of old Journals 96 .00 185.58 
Cost of old Journals sold 182.93 100 .00 
Miscellaneous 1,699 .50 784 .22 


$42,114.85 $42,083.71 


Excess of income over expenses for the year $ 6,397.82 $ 4,696.63 
Surplus or (deficit) account 
At beginning of the year (3,080.39) (7,777.02) 


At end of year $ 3,317.43 $(3,080.39) 
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REPRINTS OF ABSTRACTS IN STATISTICAL METHODOLOGY 
Edited by Max A. Woopsury, Princeton University 
Several abstracting services provide coverage of statistical papers in various fields, eB Biological 
Abstracts, Journal of Marketing, Mathematical Reviews, Population Indez, and Peychological Abstracts. 
Even though the abstracts are prepared for readers in special fields of application rather than for 
statisticians as such, the collection of as many as possible of them into one place may be a useful 
service to statisticians. Accordingly, the JourNAL is initiating this section on a trial basis, under the 
editorship of Max A Woodbury, Box 708, Princeton, New Jersey. Comments and suggestions will 


be welcome. 


POPULATION INDEX 
1950 


Mahalanobis, P. C., et al. 

thropometric survey of the United 
Provinces, 1941: a statistical study. San- 
khy4, The Indian Journal of Statistics 9(2- 
3):90-324. 1949. 

A detailed report onthe methods and re- 
sults of the anthropometric survey carried 
out in connection with the population cen- 
sus of 1941. Part 1, "The field survey," by 
D. N. Majumdar, gives a general account 
of the field survey and 2 description of the 
definitions and techniques used, Part II, 
"Statistical analysis," by P.C. Mahalanobis 
and C.R. Rao, deals with the basic statis- 
tical concepts and their application to the 

anthropological classification. 

, "Anthropological observations," 

by P. C. Mahalanobis, contains a "Supple- 

ment," pp. 203-236, that presents notes on 

the ethnological characteristics of the dif- 
ferent castes and tribes. 


OAs Seal, H. L. 

historical note on the use of X2 to test 

the applicability of a mortality table gradu- 
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1033 Lacroix, Max. 

ccessive derivations of the Gompertz 
law. (Les dérivées successives de la loi 
de Gompertz.) Extrait du Bulletin de ]'As- 
sociation des Actuaires de l'Université de 
Lyon. Paris, Aug., 1949. Pp. 11-15. [Mim- 
eographed] 


crimination with multip 
hyd, The Indian Journal of Statistics 9(4): 
343-366. Sept., 1949. 

A methodological examination of how far 
the tests of significance are affected by in- 
crease in the number of characters. 


Rasor. Eugene A. 

fitting of = curves by means 
of a nomograpi. Journal of the American 
a 44 (248) : 548 - 553. 
Dec., . 


ad Simaika, J. B. 
the significance of a typical value in 
the renewal theory. Skandinavisk Aktuarie- 
tidskrift 30(2):121-129. 1947. 

"If P(t) represents — ——— into a 

on by time t s) is the propor- 

Goes now tahventit chapiaaaed by time s 
after their introduction, the total number of 
individuals eliminated by time t is {* Pi(t- 
s)dQ(s) «P(t-C) say. The pu is to in- 
terpret C in two cases where life is bound- 
ed and unbounded respectively. Includes 
as aspecial case a human population in 
which the rate of increase is constant. 
[J.1.A. 74(Part 0, 339)-360} 


a) Talacko, J. 

. a theory of growth with spe- 
cia to population problems. Ak- 
consent Védy 8(1):21-35. 1947-1948. 

“Discusses the various laws of growth 
obtained as solutions of differential equa- 
tions. Proceeds to the generalized logis- 
tic curve and details methods of determin- 
-. r constants." [J.1.A. 74(Part II, 339) 


Wallis, W. Allen. 
tatistics of the Kinsey report. Journal 
of the American Statistical Association 44 
(248):463-484. Dec., 1949. 
An appraisal of the collection, presenta- 
tion, and interpretation of the data. For 
< cmaag to the original report, see 14(2)- 
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Yates, Frank. 
ampling methods for censuses and sur- 
veys. London, Charles Griffin and Co., 
Ltd., 1949. xiv, 318 pp. 

This work originated in the request of 
the United Nations Sub-Commission on Sta- 
tistical Sampling that a manual be prepared 
to assist in the 1950 censuses of agricul- 
ture and It seeks "to cover all 
the modern developments of sampling the- 
orywhich are of importance in census and 
survey work, and to give an adequate dis- 
cussion of the complexities that are en- 
countered in their practical application." 
Successive chapters deal with: 1. The 
place of sampling in census work; 2. Re- 
quirements of a good sample; 3. The struc- 
ture of various types of sample; 4. Practi- 
cal problems arising in the planning of a 
survey; 5. Problems arising in the execu- 
tion and analysis of a survey; 6. Estimation 
of the po on values; 7. Estimation of 
the sampling error; 8. Efficiency. There 
is a full bibliography. 


1587 Birnbaum, Z. W., and Sirken, Mon- 
G 


roe G. 

Bias due to non-availability in sampling 

surveys. Journal of the American Statisti- 
cal Association 45 (249): 98-111. March, 
1950. 
"A technique is presented for the treat- 
ment of errors introduced into sampling 
surveys due to the non-availability of re- 
spondents.” 


1389 Combes, B. 

e principle of Bayes and the problem 
of adjustment. Application to the construc - 
tion of life tables. (Le principe de Bayes 
et le probléme de l'ajustement. Application 
a la construction des tables de mortalité.) 
Bulletin Trimestriel de l'Institut des Actu- 
aires Francais, Nos. 180-181, pp. 1-70. 
Sept. -Dec., 1947. 

",..concerned with the estimation by 
maximum likelihood of the rate of mortality 
at age x basedon the observation of the ra- 
tio of deaths to exposed to risk at the n ages 
x, x+1,...,%... ,4 and on the hypothesis 
that the 'true' rates monotonically increase 
withage. The author points out that the re- 
sulting method is only suitable as a pre- 
liminary smoothing of data prior to gradu- 
ation and illustrates it on the old AF table 
and on a new mortality table of invalid 
lives.” [J.1.A 75(Part 1, 340)-109] 


1593 Hauser, Philip M 

Some aspects of methodological research 
in the 1950 census. Public Opinion Quar- 
terly 14(1):5-13. Spring. 1956. 

Describes "some of the recent work done 
by the Census in the fields of sampling 
methods, enumeration techniques, question 
wording, and the training and supervising 
of field staff. It also outlines various meth- 
odological studies which will be conducted 
during the current Census." 


1595 Hyrenius, Hannes. ° 

Sampling distributions from a compound 
normal parent population. Skandinavisk Ak. 
tuarietidskrift, No. 3-4, pp. 180-187. 1949, 
{In English] 


Irwin, J. D. 

e standarderror of an estimate of ex. 
pectation of life, with special reference to 
Se re a in experi- 
ments with mice. Jou of Hygiene / 
don) 47(2):188 ff. 1949.7 ene ‘Lon 


1598 Madow, Lillian H. 

On the use of the county as the primary 
sampling unit for state estimates. Journal 
of the American Statistical Association 45 
(249):30-47. March, 1950. 

An analysis of the efficiency of using the 
county as the primary sampling unit for 
making state estimates of socio-economic 
and agricultural characteristics, with ref- 
erence to experience in North Carolina. 


” umes Jack, and Morrison, Wini- 


r . 

Simplified procedures for fitting a Gom- 
pertz curve and a modified exponential 
curve. Journal of the American Statistical 
Association 45(249):87-97. March, 1950. 

The methods described are “useful in 
determining which type of growth curve is 
most appropriate for a given set of data." 


2084 Arley, N. 
On the 'birth- and death’ process. Skan- 
Guavtah Abtuasiotidetee® 8 9(1-2):21-26, 


"An alternative method of solution of the 
general ‘birth anddeath’ differential equa- 
tion.” [J.1.A. 76(Part 1, 342):90] 


mT ee 
roposa ac icient of accuracy. 
(Propuesta de un coeficiente de exactitud) 
Estadistica 8(26):49-58. March, 1950, 
"This paper concerns methods of ap- 
praising the accuracy of responses regard- 
ing age in census enumerations." After 
discussing some methods published up to 
the present time, "the author proposes 
another method which he calls the coeffi- 
cient of accuracy.” 


PSYCHOLOGICAL ABSTRACTS 
1950 


17 Chernoff, Herman. (Brown U/, Providence, 
R 1) Asymptotic studentization in testing of 
hypotheses. Ann math Statsst , 1949, 20, 268-278 
—A method for finding entical regions of almost 
constant size 1s presented ‘“‘Under reasonable con 
ditions the sth step of this method gives a critical 
region of size a + R,(@) where 6 is the unknown 
value of the nuisance parameter. R,(@) = 0( N *”) 
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and N is the sample size. The first step of this 
method gives the region which is obtained by as- 
suming that an estimate 6 of the nuisance parameter 
js actually equal to 0."—G. C Carter. 

427. Emmett, W. G. (U. Edinburgh, Scotland.) 
Factor analysis by Lawley’s method of maximum 
likelihood. Brit. J. Psychol., Statis. Sect., 1949, 2, 
90-97.—A method of analysis for 3 or more factors is 

nted. A practical example is used to illustrate 
wley’s maximum likelihood method of factor 
analysis. Checks on arithmetical accuracy are 
made at each stage. Lawley's significance tests for 
matrix of residuals, a single residual and a factor 
loading are demonstrated. Formulae and tables are 
included —G. C. Carter 

429 Finney, D. J. (U Oxford, Eng.) The 
truncated binomial distribution. Ann. Enugen., 
Camb. 1949, 14, 319-328.—For data on a character 
distributed in sibships according to a binomial 
probability law from which, by reason of the nature 
of the records, one of the extreme classes is absent, 
the calculation of the maximum likelihood estimation 
of the parameter may be performed as an iterative 
scoring process. Algebraically this is equivalent to 
any other iterative solution, but in practice is much 
simpler than others because the chief functions re- 
quired in it can be tabulated. Tables enable the 
estimate to be obtained in 2 cycles of iteration, each 
requiring only summations of products of tabulated 
quantities, and additional accuracy can be gained 
by linear interpolation The p ure is illustrated 
on data relating to the frequency of albinism in 
human families. Formulae for the extension of the 
method to a doubly truncated distribution, and a 
wre table of numerical values, are also given —A 
Wei 


881. Hamilton, Max. (U Coll., London, Eng.) 
A simple diagram for obtaining tetrachoric convele> 
tion coefficients. Brit. J. Psychol., 1949 39, 168- 
171—A simple nomograph for use in obtainin 
tetrachoric correlation coefficients is presented an 
its use described. ‘‘The results are generally accurate 
to three decimal places when the more uneven di- 
aoe is no further than 70% and 30%."—L. E. 
une. 


882. Hansen, Morris H., & Hurwitz, William N. 
Cn the determination of abilities in 
sampling. Ann. math. Statist., 1949, 20(3), 426- 
432.—A method for determining the probabilities of 
selection which minimize the variance of the sample 
estimate at a fixed cost is developed. Approxima- 
tions with practical applications are presented. In 
many sampling problemas, the use of constant prob- 
abilities is neither necessary nor desirable. It is not 
only possible to obtain unbiased or consistent esti- 
mates with varying probabilities of selection of 
sampling units, but it is also ible to reduce the 
variance of sample estimates use of this device 
A table and formulae are inc’ ded.—G. C. Carter. 


884. Hoel, P. G., & Peterson, R. P. (U. Cali- 
fornia, Los Angeles.) A solution to the problem of 
optimum classifica Ann. math. Statist., 1949, 
20(3), 433-438.—By using a general theorem, the 
space of the variables of classification is se ted 
into population regions such that the probability of a 
corréct classification is maximized. This theorem is 
applicable for any number of populations and vari- 
ables but requires a knowledge of population param- 
eters and probabilities. A second theorem makes 
it possible to establish a large sample criteriop for 


determining an optimum set of estimates for the un- 
known parameters. The two theorems may be com- 
bined to yield a solution to the problem of how best 
to discriminate between two or more bs oy 
Mathematical derivations are included—G. C. 
Carter 

890. Madow, William G. (U North Carolina, 
Chapel Hill.) Oa the theory of systematic sampling, 
Il. Ann. math Statist., 1949, 20(3), 333-354.—Two 
theorems which have applications in sampling are 
derived. In designing sample surveys we should try 
to induce negative correlation between strata. If a 
population has a concave upwards correlogram, and 
if strata are defined in an optimum fashion for che 
selection of one element at random from each stra- 
tum, then we can define a systematic type design 
that will be more efficient than independent random 
selection from each stratum. Various results in 
systematic sampling of clusters are presented largely 
as applications of the more general theorems. 
Formulae are developed for showing the conditions 
under which systematic sampling may be expected, 
to be more efficient than random or stratified ran- 
dom sampling.—G. C Carter 


1550. Du Mas, Frank M. (Florida State U., 
Gainesville.) The coefficient of file similarity. 
J clin. Psychol, 1949, 5, 123-131.—“This paper 
attempts to derive a meaningful index of the similar- 
ity of one profile to another The distributions of 
this index are ascertained and an error term rational- 
ized Tables were constructed so that little or no 
computation is n These tables yield not 
only the index, called the coefficient of profile similar- 
ity, rp, but the value of r,. necessary for a test of the 
null hypothesis at six different levels of confidence. 
The emphasis of this paper has been to ‘derive a 
statistic and its error term that could be used rou- 
tinely by > This statistic may be ap- 
plied to test batteries as well as to tests. Its greatest 
— may well be in the psychological clinic, voca- 
tional guidance, and personnel selection"—L. B. 
Heathers 

1551 Griffith, R. M. (U Kentucky, Lexington.) 
Odds adjustments by American horse-race bettors. 
Amer J Psychol., 1949, 62, 290-294.—The socially 
determined betting odds express (reciprocally) a psy- 
chological probability while the percentage of win- 
ners at any odds group measures the true probabil- 
ity. any consistent discrepancy between the two may 
cast light not only on the specific topics of horse-race 
betting and gambling but on the more general field 
of the a of probabilities. Data were ob- 
tained {rom a total of 1386 races. The analysis in- 
dicaves that the odds are, on the average, correct re- 
flections of the horses’ chances. The indifference- 
ew occurs at odds of 6.1 with short-odded horses 

ing underevaluated and an over-evaluation of 
those long-odded horses. The relations between 
these findings and some previous reports on prob- 
abilities are pointed out.—S. C. Ericksen. 


1553. Quenouille, M. H. (Marischal Coll., Aber- 
deen, Scotland.) The analysis of covariance and non- 
orthogonal comparisons. Biometrics, 1948, 4, 240- 
246.—Orthogonality defined by F. Gates as that 
property of a design which ensures that the different 
classes of effects to which the experimental material 
is subject shall be capable of direct and separate 
estimation without entanglement is a desired feature 
of any design but unfortunately the design of experi- 
ments cannot always be determined prior to the 
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commencement of an experiment, while experiments 
which are planned as orthogonal are frequently con- 
founded by extr causes. The application of 
the analysis of covariance in non-orthogonal com- 
parisons is demonstrated with data from Yates’ 
experiment on the growth (in terms of total bird 
weights) of cockerels and pullets under 3 different 
treatments —F. C. Sumner. 


2238. = E. H. (3 West End Ave., Pinner, 
England.) easurement of diversity. Nature, 
Lond., 1949, 163, 688.—A measure of concentration 
(related to Yule’s “characteristic” and Fisher's 
“index of diversity” for use when individuals of a 
population are classified into groups) is developed in 
terms of population constants.—A. C. Hoffman. 


3931. Court, Arnold. (0; of the Quartermaster 
General, Washington, D. 3 gg Em frequency 
distributions into two normal ents. Scsence, 
1949, 110, 500-501.—‘‘The method for such. separa- 
tion, outlined by Charlier more than 40 ycars ago 
but little noticed, is based on Pearson's general 
method for finding two normal components in any 
distribution, which assumes nothing about them 
except their existence, and requires solution of a 
complete ninth degree equation involving the first 
five moments of the given distribution.” The valid 
assumptions about the components and the char- 
acteristics of the distribution to be resolved needed 
to use this method are stated. Examples and dis- 
cussion of use of the method are presented briefly.— 
B. R. Fisher 


4073. Stephan, Frederick F. (Princeton U., N. J.) 
Amer. J. Sociol., 1950, 5S, 371-375.— 
Sampling problems arise in social research in the 
development of techniques of observation and meas- 
urement and in the analysis and interpretation of 
data. Fundamentally, they stem from limitations 
on the number, accuracy, and scope of observations, 
and their solution consists of finding the most gffec- 
tive way of conducting research under these restric- 
tions. There is no universally ‘“‘best"’ method of 
sampling; the technique must be designed to fit the 
particular circumstances of each situation. The 
technical problems of sampling are outlined in terms 
of the initial specification, design, costs, and re- 
— accuracy, operation, and use.—D x 
ick. 


4536. Wolford, Opal Powell. How early back- 
ground affects behavior. J. Home Econ., 
1948, 40, 505-506.—The author reports a study of 
the dating behavior and the personal and family 
relationships of a os. of high school seniors (162 
boys and 235 girls). Dating behavior as indicated 
by the stage of dating seemed related to personal and 
family background factors. The seniors not dating 
presented a more negative, less wholesome picture 
in family relationships, feeling of self-regard, and 
social relations than did the other seniors. Of back- 
ground factors, the relationship to their families, 
including nativity, appeared to be the major 
determining factor for the girls. The major ones for 
the boys were their attitude of self-regard during 
childhood and their role in social activities. Veri- 
fication and further study are needed to determine 
the significance of these factors in the heterosexual 
adjustment of young people and the influence on 
later courtship and marriage.—(Courtesy of Chsld 
Developm. Abstr.) 
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4537 Zankov, L. V. Puti ologich 
issledovaniia. (Methods of psychological investiga. 
tion and the overcoming of bourgeois influences.) 
Sovetsk. pedag., 1949, No. 5, 63-71.—Use of the 
cross-sectional approach to the study of the child 
is attacked on the ground that the essential task of 
developmental psychology is to understand the 
process whereby the child moves from one stage to 
another. Various works are criticized because the 
authors stopped at the establishment of age differ. 
ences in the development of the child. Western 
nanny coral use of experiments is excoriated 

use their failure to come to grips with concrete 
problems of child development.—R. A. Bauer. 


4968. Brogden, Hubert E. A new coefficient; 
tion to biserial correlation and to estimation 
selective efficiency. Washington, D. C.: Per. 
sonnel Research Section, The Adjutant General's 
Office, 1949 (PRS Rep. 773.) ii, 10 p—"A 
coefficient of selective efficiency is proposed which 
can be usefully applied to selection problems involv. 
ing the evaluation of the validity of (1) dichotomous 
predictors, and (2) continuous predictors at a 
particular or at successive points of cut.""—R. Tyson, 


4975. Kerrich, J. EB. (U. Witwatersrand, Johan- 
nesburg, S. Africa.) Normalization of frequ 
functions. _ Nature, Lond., 1949, 164, 1089.— 
technique is described for predicting the trans. 
formation required to normalize a frequency dis- 
tribution.—A. C. Hoffman. 


5606. Iyer, P. V. Krishna. (Indian Counci 
Agricultural Research, New Delhi, India.) Diffed 
ence equations of moment: — functions for 
some probability distri ~~ ature, Lond., 
1950, 165, 370.—“This note — in seriatim the 
difference uations for the following distributions: 
(i) the ouaden of joins between points of different 
colours when the points can assume one of k colours 
with probabilities p;, ps . . . px, (ii) the number of 
runs of length r of a specified colour, (iii) the number 
of runs of length r or more of a specified colour, and 
(iv) the number of triplets, quadruplets, etc., of a 
specified colour."—A. C. Hoffman. 

oe M Alexander McFarlane. Introduction 
to the theory of statistics. New York: McGraw-Hill, 
1950. xiii, 433 p., $5.00.—Designed as an intro- 
ductory text in mathematical statistics the author 
emphasizes the statistical aspects rather than 
mathematics se. Illustrations are given from 
many fields in which statistics may be applied. 
Concepts of probability theory are first considered 
and the development then goes on to matters of 
distributions and sampling. Chapters are devoted 
to statistical inference including confidence and in- 
terval estimation, tests of hypotheses, experimental 
design, and the analysis of variance. The emphasis 
throughout is on theory although examples of ap- 
plication are given in the text and practical problems 
are included in the problems following each chapter. 


—C. M. Louttit. 


5610. Thompson, H.R. (Plant Diseases Division, 
Auckland, N. Z.) Truncated normal distributions. 
Nature, Lond., 1950, 165, 444-445.—A method of 
curve-fitting is described for use with truncated 
normal distributions.—A. C. Hoffman. 





BOOK REVIEWS 


Chance and Choice by Cardpack and Chessboard. Lancelot Hogben (Professor of 
Medical Statistics in the University of Birmingham). New York: Chanticleer 
Press, 1950. Pp. 417 (6 X8 type page). $12.50. 


GrorcGe W. SNepEcoR, Jowa State College 


HIs is a challenging book. 

As may be indicated in the title, one purpose of the author is to 
“start, as Pascal started, with some preliminary theorems not commonly 
dealt with in modern text books” in an attempt to “recapture the thought 
of those who laid the foundations of the theory of chance in the seventeenth 
and eighteenth centuries.” A second purpose is to emphasize exact state- 
ments based on discrete distributions (chiefly the binomial) in contrast 
with approximate solutions derived from continuous distributions such as 
the normal. Another purpose is to exploit a “new educational technique” 
which (if I read aright the second paragraph of the Foreword) is the use of 
rather elaborate visual aids. These consist of numerous diagrams, many in 
red and black, depicting arrangements of cards, dice, and balls. Quoting 
from page 58: “Difficulty in dealing with problems of choice, i.e., sampling, 
arises less in connection with the solution of the mathematical problem than 
with the selection of the mathematical operation appropriate to its verbal 
formulation. It is therefore desirable at the outset to define in explicit 
terms conditions relevant to the enumeration and specification of samples. 
Also at the outset, it is well for the beginner to realize that difficulties 
which beset the verbal formulation will be less than otherwise forbidding, 
if study of the visual aids keeps in step with reading of the text.” 

In the title, the use of Whitworth’s alliteration in reverse (in the text, it 
regularly appears in its original order) should not lead one to assume that 
the present treatment is similar to that of the earlier writer. Professor Hog- 
ben’s book is strictly up-to-date. The cardpack and chessboard are made to 
rub elbows with all the rubrics of modern mathematical statistics. Not even 
the controversial topics are avoided, Bayes’ theorem and fiducial probability 
being among the subjects discussed. 

Two introductory chapters dealing with the older topics are followed by 
another pair treating exact tests in binary and 2X2 classifications, the 
matter of independence in a contingency table being left for Volume 2. 
“Significance and Confidence” are next discussed with an interesting in- 
troduction called “The Bayes Balance Sheet.” The sixth chapter is an 
“Interlude on the Method of Moments.” Following are chapters on “The 
Recognition of a Mean Difference,” “Correlation and Independence,” and 
“The Nature of Concomitant Variation.” In the last chapter, “Preview of 
Sampling Systems,” the Lexis and Poisson models are treated, followed by an 
anticipatory look at some of the modern experimental designs. 
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Throughout the book the technique is the same: the introduction and 
development of the discrete model, the progress to a sample size where com- 
putation becomes prohibitive, and the introduction of the continuous model 
“for construction of convenient formulae for computations of sufficient 
precision for practical purposes.” The philosophy of the book is, I think, 
epitomized in this paragraph (page 179): “It is our standpoint that Statisti- 
cal theory, as concerned with sampling, derives its practical rationale from 
the calculus of choice and chance, as first expounded by Pascal; but the 
application of the calculus of choice and chance to large sampling confronts 
us with formidable problems of computation. Consequently, it is the con- 
stant preoccupation of theoretical statistics to substitute for exact state- 
ments which would involve laborious evaluation approximate formulae to 
provide sufficiently safe guidance for practical judgments. The rationale of 
such approximations is referable to purely mathematical considerations 
having no necessary connection with the laws of choice as such; and it is 
therefore easy to lose sight of the nature of the problem which invokes such 
operations in a welter of symbols with no direct relation to it. We can keep 
our feet on solid ground only if we constantly restate the problems in ezact 
terms as a prelude to the derivation of an approximate solution; and we 
can keep ourselves alert to the limitations of such solutions only if we do so.” 

The reader will never be bored while perusing this text. Pungent phrases 
and sly digs abound. Pervading the volume is a delightful humor now impish 
and again sardonic, not infrequently taking the form of a supererogatory 
adhibition of polysyllabic verbiage. At least half in fun, I think, is the array 
of picturesque labels designating mathematical models; the chessboard and 
staircase models, for example; the umpire-bonus model (common elements) 
and the handicap score-grid medel (two-factor experiment or randomized 
blocks). These are apt descriptions of the manner in which chance events 
may occur in the realm of a priori probability. 

Chance and Choice is not for cursory reading. The lack of an index 
penalizes skipping. Numerous cross references to numbered sections require 
thumbing through the pages because neither the chapter nor section number 
is indicated in the headings. With few exceptions conventional terminology is 
displaced by new, the technical terms being introduced casually in chapter 
headings, in tables and examples, as well as in the text. The arguments are 
drawn out over scores of pages with more or less relevant intercalations. 

The book is not addressed to any one audience. For the beginner, there are 
the visual aids and several introductory passages which are models of 
exposition. The mathematician will find convenient summaries of the idioms 
of earlier days. But these need not- hamper the layman because the verbal 
logic is adequately developed in parallel. The professional statistician will be 
interested in the polemics as well as in the originality of the viewpoint. 
There are few, I think, who will not find their wits sharpened and their 
ideas marshalled in more orderly ranks by Professor Hogben’s meticulous 
insistence on fine distinctions. Chance and Choice might well be included 
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in any list of required reading for students of statistics at the first or second 
year of graduate study (in American colleges). 

The errors in this massive work are remarkably few. The occasional slips 
which I detected do not slow down the reading, being quite obvious. One 
exception may be mentioned, not because it has any importance in itself 
but because it typifies the way in which ideas must be pursued in reading 
this book. Chapter 4 is titled, “The Recognition of a Taxonomic Difference.” 
Being conditioned to novel terminology, I wondered what special meaning 
would be assigned to “Taxonomic.” After reading 10 pages in the chapter, 
I found the first use of the term: “This will become apparent at a later 
stage when we extend to representative scoring (vide 4.10 infra) the mathe- 
matical theory of probability here applied to taxonomic scoring w.r.t.binary 
classification.” Perhaps it is plain by inference that taxonomic scoring is 
here defined as the scoring done in binary classifications, but to make sure I 
turned forward 40 pages to section 4.10. There I found nothing pertinent, 
but did observe another reference, “vide infra, Chapter 6.” But Chapter 6 
is “Interlude on the Method of Moments,” so I gave up that lead. It turned 
out that the reference should have been to section 4.11 where, on the last 
page of the chapter, I found this statement, “The problem of detecting a 
difference dealt with in this chapter is how to detect a difference which in- 
volves a taxonomic score.” I finally concluded that a taxonomic difference is 
the difference between the relative frequencies of attributes. 

Similarly, one must read odds and ends of paragraphs from page 62 to 
page 95 to learn the connection between “electivity (i.e., mathematical 
probability)” and “probability in common parlance” (chance?). For this 
connection, Professor Hogben repudiates the phrase, “equally likely” and 
substitutes “equipartition of opportunity for association,” the final appeal 
being to “intuition which is common sense.” 

The beginner should be warned of a few deviations from the exact specific- 
ity for which Professor Hogben strives. Four examples: “The normal curve 
... is thus the limit to which the contour of the binomial histogram attains 
when r is indefinitely large” (page 123); “Tchebychev’s Theorem... is 
noteworthy because its proof entails no assumption concerning the nature 
of the distribution . . . ” (page 134); “It is customary to designate as fiducial 
limits the boundaries fixed by a critical ratio calculated from a value of ¢ 
by recourse to the approximation pop...” (page 213); “Thus, the odds 
are 20:1 that the true value of p will be within the range 0.105 and 0.067 if 
the null hypothesis is correct...’ (page 214). 

I wish that Professor Hogben had been more liberal with references to his 
sources. The student who will profit most by this book is the one who will 
wish to extend his readings to the original papers. He will have to look else- 
where for most of his clues. 

Professor Hogben makes this text a vividly personal contribution. I have 
a strong feeling that whom having not seen I know. I would not willingly 
have foregone the reading of his book. 
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Statistics, Volume I. N. L. Johnson and H. Tetley. Cambridge, England: Cam- 
bridge University Press, 1949. Pp. xii, 294. $5.50. 


Pau S. Dwyer, University of Michigan 


His volume is one of the text books published under the authority of the 

Institute of Actuaries and the Faculty of Actuaries in order to meet the 
needs of students studying for the examinations of the new Syllabus of the 
Institute of Actuaries. Volume I, which is labeled as “an intermediate text 
book,” is to be followed by the more advanced Volume II and, indeed, 
several references are made to the specific chapters of the second volume, 
The term “intermediate” as explained by the authors in the preface is used 
“to indicate that the level of treatment is between that of elementary text- 
books requiring little or no mathematics beyond matriculation stage, and 
that of advanced text-books and original papers employing, where necessary, 
the resources of modern analysis.” This book should not be confused with 
the earlier Volume I of Actuarial Statistics by Tetley (which was written 
before the revision of the Syllabus) though there are portions of the two 
books which are similar. 

The book is written at about the level of courses in finite differences and 
algebraic probability theory involving urns, dice, etc. The typical actuarial 
student will be studying these subjects concurrently. The first volume does 
not include small sampling theory, which will be included in the second 
volume. The actuarial student will study the second volume concurrently 
with life contingencies, the theory of mortality statistics, etc. 

The authors feel that practical acquaintance with the problems to be 
attached should precede the presentation of the theory, after which the 
application of the theory to the practical problems should be made. This is 
the basis of the general organization of the book. The first four chapters are 
used to discuss the reduction and summarization of masses of observed data. 
The second four chapters are used to present probability theory and such 
mathematical concepts as expected value, random variable, and probability 
distribution. Application of the theory is made in the last two chapters. 
Chapter 9 deals with statistical hypotheses and chapter 10 deals with the 
more common tests of significance involving means, standard deviations, and 
proportions. Each is presented in mathematical form with a careful state- 
ment of the particular hypothesis being tested. Applications are to large 
samples and, admittedly, some of the discussion is given with a view to 
providing background material for the more advanced and detailed pre- 
sentation which will appear in chapter 11 (the first chapter of the second 
volume). On the whole one gets the impression that those tests which are 
given are presented in precise, yet concise, form. 

The book proper is followed by appendices on corrections for grouping, 
Stirling’s approximation to n!, and the normal curve as an approximation 
to the binomial probabilities. The last appendix consists of a short table 
listing values of the ordinate and cumulative distribution function for the 
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normal curve. This is the only statistical table of this kind which is needed 
since the various statistical tests of chapter 10, for large samples, reduce to 
normal form. 

The book is provided with an adequate amount of illustrative examples. 
Substantial exercises are provided at the end of each chapter. Actuarial 
applications are featured, but not exclusively. The illustrations have general 
interest. 

The authors have indicated in their preface “although this book is in- 
tended primarily for those studying for the statistical sections of the ac- 
tuarial examinations, it is hoped that it will be found useful in a consider- 
ably wider field.” It is my opinion that the authors have every right to 
expect their hopes to be fulfilled. The precise and concise treatment, the 
detailed material selected, as well as the illustrations and examples, will be 
useful to many students not primarily interested in actuarial studies. 


Applied Statistics. Forrest R. Immer. Minneapolis, Minnesota: Burgess Publish- 
ing Company, 1950. Mimeoprint, spiral bound. Pp. ii, 157. $3.25. 


Emit H. Jese, Jowa State College 


His volume comprises a series of lectures given by the late Dr. Forrest R. 

Immer in the Graduate School of the University of Minnesota. It is 
pointed out that the lectures were prepared 1935-1946 and that they are 
published posthumously without editing, July, 1950. There are 28 lectures 
in Applied Statistics and they cover a wide range of subjects. Each lecture 
takes up a particular statistical technique in connection with a specific set 
of data. From the table of contents, we note (in part) this order of topics 
for the lectures: “The analysis of variance, randomized block trials, The 
“t” test, ... Calculation of coefficients of correlation and regression from 
analysis of variance and covariance, Estimating the yield of a missing plot, 
Split plot experiments, ... Sampling technique, Calculation of quadratic 
regression, Linear regression, Use of preliminary trial data for subsequent 
experiments, The x? test, . . . Regression ‘vith several independent variables, 
Analysis of variance with dis-proportionate sub-class numbers... .’’ The 
last 70 pages are principally devoted to the analysis of incomplete-block 
designs including the simple, triple, and balanced lattices. In general, the 
discussion of the statistical techniques and the illustrations are directed to 
the plant breeder and agronomist. 

One is impressed by the general excellence of the examples used through- 
out most of Dr. Immer’s lectures. Several of these examples are classics in 
the literature for which Dr. Immer is already well known. I note in particular 
the barley varietal tests at several locations that have been cited by Fisher, 
Snedecor, Yates, and others. The comparison of six methods for determining 
the significance of the difference between two means, pp. 11-15, is informa- 
tive. 
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It should be noted that Applied Statistics is not a beginner’s book. This 
collection of lecture notes or apparently lecture supplements must be viewed 
as quite incomplete. In its present form a considerable knowledge of the sub- 
ject is required before the booklet can be read with understanding. This 
brings up the point as to whether or not the title itself is appropriate. There 
is not an orderly treatment of the topics of statistical methods in the booklet. 
A more suitable title that suggests itself is “applications of selected statistical 
methods in field experimentation” with complete numerical solutions of 
examples. Note the emphasis on numerical; derivation, explanation, and 
interpretation of the techniques employed must be sought elsewhere. 

It appears questionable that the compilers of this volume are doing justice 
to Dr. Immer’s memory in presenting Applied Statistics to the public. The 
contents, style of presentation, and statistical quality are entirely incom- 
patible with the reputation of Dr. Immer as a scholar and his renown as a 
scientist. Certainly, Dr. Immer himself would not have published these 
lectures without editing. . 

The matter of the incompleteness of these lecture materials merits further 
comment, In his classes, Dr. Immer surely must have given more to his 
students than appears in this booklet of lectures. As a specific example there 
is given in Hayes and Immer, Methods of Plant Breeding, (McGraw-Hill, 
1942) a reasonable account of recovery of inter-block information for in- 
complete block designs, making it clear that Dr. Immer was familiar with 
the advancing knowledge in this field. In Applied Statistics, however, the 
connection between the original intra-block analysis and the recovery of 
inter-block information is not at all clear. In fact, on pages 108-109 and 151- 
152 the same description and exactly the same data are presented but the 
analyses conducted are different. The uninitiated student would have great 
difficulty in figuring out the why and wherefore of the numerical results 
obtained. Since Dr. Immer is loved, revered, and respected by many it seems 
unfair to have committed these lectures to publication without extensive 
revision. 

Issue must also be taken with another point in the published description 
of the booklet, ‘‘... one of the values of Dr. Immer’s writing is that he 
always took a set of data and followed through all the computations step 
by step. This means that one not acquainted with the theory may still apply the 
methods” (italics mine). In no way can one accept the last sentence just 
quoted nor can I imagine Dr. Immer making such a statement. Certainly, 
the content of Applied Statistics does not support the quoted opinion. The 
first lecture opens directly with the analysis of variance for randomized 
block trials. The assumptions of the analysis of variance are not explicitly 
discussed. A mathematical model for the population from which the data 
provide a sample is not even mentioned. The expectations of the mean 
squares in the analysis of variance table are not given. Throughout the series 
of lectures it appears that an analysis-of-variance model is always assumed, 
ie., the treatments studied represent fixed effects; at no place is a com- 
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ponents-of-variance model adequately discussed, although one example ap- 
plies the components analysis to a sampling technique problem. Similar 
comments could be made for a number of other lectures in the booklet. 

Perhaps the greatest value of these lecture supplements to Dr. Immer’s 
students and to statisticians in general will be found in the numerically 
complete examples. In this restricted sense, the booklet may fulfill one of its 
stated purposes: “It will serve both as a practical work book and a memorial 
to Dr. Immer” (quoted from the publisher’s notes). 


The Principles of Scientific Research. Paul Freedman. Washington, D. C.: 
Public Affairs Press, 1950. Pp. xi, 222. $3.25. 


C. West CuurcHMAN, Wayne University 


uIs book consists in the main of somewhat disorganized reflections of a 
y icon research worker in the physical sciences concerning the best 
method of approaching and solving research problems. There is a general 
discussion of the history and philosophy of “research,” the planning and 
organization of research, the steps of experimentation, the use of statistics, 
and finally some succinct remarks on “patrons” (industrial researchers in 
this country would refer to them as “brass”). 

The book is written wisely, by which I mean that there are many comments 
and pieces of advice that can only come from a wise head that has acutely 
observed the many aspects of the researchers’ world. For example, the author 
advises the young researcher to go to the original sources rather than rely on 
texts alone, since texts tend to neglect the precautions and doubts of the 
original (p. 83). He points out that one may often avoid complicated meas- 
uring techniques by employing, for instance, the sense of touch rather than 
the sense of sight (p. 104). He says that it is always a good idea to keep note 
books, especially if one may subsequently be challenged for originality (p. 
163). Such sage pieces of advice cannot help but assist the novice in research 
to understand better the kind of world he is to inhabit. 

The function of a review should also consist in indicating the limits of a 
work, especially when the “The” of its title implies a much more extended 
treatment of the topic than the book actually covers. First, the author 
restricts “research” to the physical sciences, and indeed seems to regard social 
research as non-scientific. Second, there is no systematic treatment of the 
entire problem of research method; for example, the crucial problem of the 
logic of testing hypotheses is only treated indirectly. Third, his remarks 
on the history of philosophy must certainly be taken cum grano salis. Last, 
the treatment of statistical techniques, though involving some nice ideas, 
is very sketchy. 

The book makes pleasant reading and will provide an excellent addition 
to the library of a research laboratory, though I’d suggest that any “Junior” 
scientist discuss his reading of the text with his “Senior” before acting on the 
conclusions. 
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Scientific Method for Auditing: Applications of Statistical Sampling Theory to 
Auditing Procedure. Lawrence L. Vance (Associate Professor of Accounting, 
University of California). Berkeley and Los Angeles: University of California 
Press, 1950. Pp. xii, 108. $2.50. 


Howarp L. Jonzs, Illinois Bell Telephone Company 


HE stated purpose of this book is to make available to the accounting pro- 

fession some of the techniques developed by statisticians for the inter- 
pretation of samples. Discussion is largely confined to possible uses of these 
techniques by a public accountant to determine whether a client’s records 
are sufficiently free of clerical errors to warrant reliance on these records in 
certifying financial statements. Sequential procedures for acceptance sam- 
pling of attributes are stressed. 

This is a pioneer book in the field, and it contains a few excellent ideas as 
to why and where statistical inference might be applied in auditing work. The 
usual formulas for computing a term in the binomial expansion, for finding 
the parameters of a sequential sampling plan, and for expressing a sample 
frequency in standard deviation units, are given and the substitution of 
actual values in these formulas is illustrated. But the author does not ap- 
pear to appreciate precisely what one can and cannot deduce about the 
population sampled when his computations are applied to practical prob- 
lems; and his pronouncements and recommendations in this direction usually 
fall short of the mark. 

To elaborate on the favorable features, the chief merit of the book lies 
in its calling attention to the unsoundness of sampling methods commonly 
employed by the accounting profession. Some such sampling, or “test- 
checking,” is a practical necessity in making a balance sheet audit, since the 
cost of verifying every accounting entry would usually be prohibitive. Cur- 
rent practice is to audit every item of material size; but to audit only that 
number of smaller items, and for the time and place, which the judgment of 
the auditor dictates. As the author points out, this practice results in biased 
samples of the records, and makes it difficult to draw conclusions about 
their quality. Scientific sampling would overcome this difficulty, and enable 
the auditor to make objective decisions with known risks that the decisions 
might be wrong. 

- Applications to various types of records are suggested. Specifically men- 
tioned are inventory records, accounts receivable, records of capital and 
maintenance expenditures, and other books of original entry or related 
records. Two actual cases are described where sequential sampling would 
have led to the conclusion that inventory records were substantially correct 
upon examining a relatively small number of items. In a third case the same 
procedure would have revealed the poor quality of certain bills received by 
a large transportation company in connection with interline freight settle- 
ments. No instances are cited, however, where scientific sampling has actu- 
ally been employed and relied upon. 
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The book’s deficiencies with respect to the interpretation of sample data 
begin to appear on page 4. In the introductory explanation of probability 
and statistical inference, the question is raised as to the interpretation to 
be placed on a sample consisting of 8 defective and 17 nondefective light 
bulbs selected at random from a lot that is supposed to have at most 10 per 
cent defective. As the author says, “a random sample of 25 from a population 
10 per cent defective will contain 8 defectives 0.180 per cent of the time, 
and it will contain that number or fewer defectives 99.954 per cent of the 
time. . . . [It] will have less than 8 defectives 99.9 per cent of the time... .” 
This seems an awkward way to say that if the fraction defective is actually 10 
per cent, a sample of 25 containing as many as 8 defectives will happen only 
0.226 per cent of the time, if this is the thought which the author wishes to 
convey. 

The sequential sampling of attributes is introduced in the first chapter, 
and a table is given and tentatively recommended in which the risks of mak- 
ing wrong decisions are a=.05, 8=.10, for alternative hypotheses that the 
fraction defective is p,=.005, p2=.03 (and also .05). In connection with 
these risks, the apologetic statement that “professional statisticians have 
generally used levels of 90 per cent to 99 per cent as a basis for decisions” is 
likely to be misleading. The rules for deciding whether to accept or reject 
the lot are first stated in the usual manner; but later discussion indicates 
that any such decision is to be regarded as only tentative until confirmed by 
another “decision” in the same direction. The formulas for sequential sam- 
pling, of course, were derived by Wald on the assumption that the initial 
decision is irrevocable; and it is likely that no one has ever tried to determine 
the probabilities for the procedure which is here recommended. Also ob- 
jectionable is the author’s implication that his sequential sampling tables 
are appropriate for the interpretation of single sampling results, or for 
testing the hypothesis that two samples are from the same population. 

The existence of other sampling tables, such as those given by Dodge- 
Romig and those in Sampling Inspection, is nowhere mentioned, although 
the latter volume is listed in the bibliography. The possibility of using 
statistical inference to estimate the fraction defective when there is no 
restriction on the number of admissible hypotheses seems to be discouraged, 
apparently for no better reason than the fact that “to say precisely what 
is the percentage of errors in the actual population ... could be done... 
only by examining all the items. . . .” The statement is made that “statisti- 
cal inference will be of little benefit in looking for fraud,” although reference 
is made to Lewis A. Carman’s article in The American Accountant for 
December, 1933, where formulas for approximating the pertinent probabil- 
ities are given. In spite of these formulas, Vance also asserts that “whatever 
the means by which the logic of binomial conditions is satisfied, the im- 
portant conclusion is that the absolute size of the sample is the important 
element in judging the effectiveness of the sample to disclose one or some 
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other number of defectives upon the selected probability basis.” A less suit- 
able application of an excellent principle can hardly be imagined. The sug- 
gestion that sequential sampling tables be used “to give a certain assurance 
that 2 or 3 or any desired number of defectives would be seen in a population 
uf specified quality” is another example of suggestions and recommenda- 
tions that will be annoying to statisticians and confusing or misleading to 
accountants. 

An interesting test for bias is suggested several times. Sample data are 
first analyzed to determine the number of sample errors that tend to over- 
state some selected ledger account, and the number of errors that tend to 
understate it. The null hypothesis that the number of errors is the same in 
each direction for the entire population of errors is then tested by comparing 
the difference between the larger number of errors and half the sum of the 
two types of errors with three times the standard deviation of the difference. 
The procedure here is technically correct; but in a large population of errors, 
the possibility that the number of errors in each direction will be exactly the 
same is too slight to warrant giving it serious consideration. For this reason, 
any negative result of one of these tests should be interpreted as indicating, 
not that the null hypothesis may be correct, but that the sample is too small 
to permit a conclusion as to the direction of the preponderant number of 
errors. Moreover, even when one knows which type of error is more frequent, 
he is still not in a position to say whether the sums of these errors will cause 
the ledger account to be overstated or understated. 

Some of the author’s views on selecting and classifying the sample data 
are open to question. To illustrate, the suggestion to ignore the last item in 
a population of size 100 when selection is made with the aid of a table of 
random numbers is indefensible (and overlooks the usefulness of the 00’s in 
the table). Again, while something like the procedure described as “subjec- 
tive randomizing” is necessary now and thei as a matter of economy, there is 
mischief in the author’s assurance to the reader that the sampler cannot 
“see” the attribute of the item he selects until he has audited it, and “cannot 
therefore exercise any kind of choice as to items which do and do not repre- 
sent errors.” On the other hand, the author’s cautions against sampling 
without replacement when the sample is larger than 10 per cent of the 
population are superfluous for most practical problems. And the conclusior 
that “it is desirable that the methods used [to classify sample items] provide 
as large a number of errors as a reasonable definition can make possible, 
since .. . a sufficient number for purposes of observation is essential to any 
mathematical treatment of them” will be concurred in by few mathema- 
ticians or accountants. 

Perhaps the worst part of the book is the section on stratified sampling. It 
is stated that this procedure requires “that the samples taken from each 
category be related in size by the same proportions.” The truth, of course, is 
that such sampling is usually most advantageous when the sampling ratios 
required for the various strata are different. The illustrative statement 
that “if 30 per cent of the extensions of an inventory are more than $5,000, a 
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sample of 100 should be selected for the purpose of stratified sampling by 
drawing 30 items from the iess than $5,000 group” seems to be inadvertent. 
The table following appears to illustrate a hypothetical case where a sample 
of 100, distributed proportionally over strata representing 20%, 30%, and 
50% of the population, contains 3 errors, 2 errors, and 4 errors from these 
respective strata. But instead of drawing the common-sense conclusion that 
the fraction defective is around 9%, the author computes a “weighted 
average” as if 100 items were selected from each stratum, whence “the result 
is interpreted as though 3.2 errors had been found in a sample of 100 taken 
randomly from the whole population.” The objectionable suggestion is then 
made that a sequential table be entered at the point corresponding to 3.2 
errors in a sample of 100. In a second illustrative example, the same mistakes 
are made, and the sample is treated as though only 1.82% defective, al- 
though items from one of the two strata are known to be 6% defective, and 
sample items from the other stratum are nearly 3% defective. 

Attempts to introduce statistical methods in auditing are laudable and 
certain to be successful eventually. It is unfortunate that the attempt 
which is here reviewed does not advance us toward the objective. 


New Facts on Business Cycles. Arthur F. Burns. National Bureau of Economic 
Research, Inc., 30th Annual Report, 1950. Pp. 83. 


KENNETH D. Roosz, Oberlin College 


jp “grand synthesis”! of cyclical measurements which was promised 
some years ago by the National Bureau of Economic Research seems to 
have been previewed by Arthur F. Burns in the 30th Annual Report of the 
Bureau. While there may have been justifiable criticism in the past of the 
reticence of the Bureau to advance causal hypotheses about business 
fluctuations, this latest statement should silence much of the criticism. For 
Burns is prepared to go so far as to assert that “...the check to the 
dominant movement of business activity, whether it be expansion or con- 
traction, is typically felt especially early in financial processes and activity 
preparatory to investment expenditure.” (p. 21) In still other respects he 
seems to make concessions to any who would charge the Bureau approach 
with being too mechanical. Thus he says, “I take it as a matter of course 
that it is vital, both theoretically and practically, to recognize the changes 
in economic organization and the episodic and random factors that make 
each business cycle a unique configuration of events.” (p. 27) But having 
made this concession, he emphasizes that the statistical records reveal the 
repetitive nature of business cycles and as such add to the “understanding 
of the business cycles, and may even prove helpful in predicting reversals 
in the direction of total economic activity—or at least in identifying them as 
such promptly.” (p. 27) 





1 Arthur F. Burns, Economic Research and the Keynesian Thinking of Our Times, National Bureau 
of Economic Research, 26th Annual Report, 1946, p. 26. 


= 
j 
} 





266 AMERICAN STATISTICAL ASSOCIATION JOURNAL, JUNE 195) 


The new facts, some of which are new only in the sense that National 
Bureau data and analysis give them a stronger empirical foundation, fall 
into two somewhat overlapping classes: (1) generalizations about the nature 
of business cycles; (2) conclusions about particular factors in business cycles, 
Under (1), National Bureau analysis reveals two cycles in economic ac- 
tivity, the movements of aggregates, and, within the aggregates, move- 
ments of specific components (the unseen cycle). Examination of these 
latter makes it apparent that not all components expand as the aggregate 
expands, nor do all components decline as the aggregate declines. Thus a 
comparison of the movements of the specific cycles with the aggregate cycle 
shows “that the proportion of expanding activities is already declining 
months before aggregate activity reaches a peak, and is already rising 
months before the aggregate reaches its trough.” (p. 11) 

Under (2), these generalizations about the nature of business cycles point 
the way to new facts about particular series in the cycle. One stresses the 
consistency and regularity with which leads and lags in certain component 
series persist from cycle to cycle: “...as a rule, the maxima and minima 
of investment orders lead the corresponding turns of production, which 
again lead the corresponding turns of income payments.” (p. 18) The 
series included in investment orders are new orders, construction contracts 
or permits, stock prices and transactions, security issues, business incorpora- 
tions, and hours worked per week. Series which tend to move with aggregate 
economic activity are production, employment, commodity prices, imports, 
and business profits. Laggards are income payments, wages, interest rates, 
retail sales, and inventories. 

A second key observation is that the number of firms experiencing in- 
creasing profits declines before aggregate profits decline and begins to in- 
crease before aggregate profits begin to increase. Burns believes therefore 
that examination of aggregate profits only, obscures important develop- 
ments within profits which may shed light on possible reversals in economic 
direction. 

The final facts are directed toward the forecasting problem. The National 
Bureau has made further progress in grouping significant economic series 
which consistently lead or lag fluctuations in economic aggregates. The dis- 
tinction between movements of the aggregates and their components may 
permit earlier recognition that an expansion is coming to a halt or a con- 
traction is about to end. The report concludes by pointing up the need for 
economic theorists, historians, and statisticians alike to concentrate effort 
on determining why some depressions are so mild and others are so severe. 

To this reviewer, such a definite statement on the implicit theory lying 
behind the work of the National Bureau, as well as the apparently revealing 
insights into the problems of fluctuation which give promise to come from 
the Bureau analysis, is a most encouraging and welcome development. 
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Shares of Upper Income Groups in Income and Savings. Simon Kuznets. Na- 
tional Bureau of Economic Research, Occasional Paper 35, 1950. Pp. 68. $1.00. 


Dororny 8S. Brapy, University of Illinois 


mon Kuznets has probably devoted more attention to income distribu- 

tion as an important field for empirical research than any other econo- 
mist. For many years he has actively sponsored the improvement in the 
collections of basic data and the analysis of the existing statistics and he has 
personally contributed to the development of research on this subject. His 
article on “National Income” in the Encyclopedia of the Social Sciences 
(1933) included one of the first discussions of the relation of income distribu- 
tion to national income statistics. His book with Friedman on Income from 
Independent Professional Practice (1945), is from many points of view the 
best analytic study of income distribution. In “The Why and How of Dis- 
tributions of Income by Size” (Studies in Income and Wealth, Vol. 5, 1943), 
he attempted to clarify the duality in function of income distribution 
statistics for empirical analysis. “...our goal,” he concluded, “is a size 
distribution so prepared as clearly to reflect income as a cause. But many 
uses of the correlation between income size and its consequences depend .. . 
upon what we know of the factors determining the size of the income; yet 
the identification of these determinants may call for size distributions differ- 
ent from those called for by attempts to study the bearing of income size 
upon actions of the recipients.” 

This NBER Occasional Paper is a summary of a searching investigation of 
the characteristics of the income distribution to be published in a two- 
volume report. In the study, Mr. Kuznets manages to prevail over intrac- 
table data and focus on the income distribution as the “key link in the 
economic mechanism.” By comparing the data from federal income tax 
returns, after many detailed adjustments and interpolations, with the in- 
come of the total population he constructed a series of annual estimates of 
the shares of the upper income groups in total income, defined in three ways, 
and in the totals for various types of income. On the assumption, supported 
by evidence from cross-section surveys for scattered dates, that the savings- 
income ratio in the upper income groups is relatively constant, he calculated 
their relative shares in total savings. The tax data limit the definition of the 
upper income group, which was taken as the upper 5 per cent of the popula- 
tion, and determine the span of the series, 1913 to 1947. Because many de- 
tails were not available for the earliest and the latest years, the main part of 
the study relates to the period 1919 through 1945. 

The war years 1939-45 saw a sharp drop in the shares of the upper income 
groups in income and savings. Since the information for the period of the 
first world war indicates a similar, though not as steep a decline, and since 
the data for the recent years show a change in direction, the author based 
his most significant conclusions on the two decades between wars, 1919- 
1938. In those years the shares of the upper income groups in total income, 





268 AMERICAN STATISTICAL ASSOCIATION JOURNAL, JUNE 1951 


and hence their shares in total savings, moved counter to the business cycle, 
The positive correlation of fluctuations in the over-all savings-income ratio 
with the business cycle must, accordingly, be traced to wide variations in 
the savings-income ratio of the lower income groups. If the savings of the 
upper and the lower groups flow to different kinds of investment, these con- 
clusions have important consequences for the study of cyclical variations in 
saving and investment. 

These conclusions, carefully stamped as tentative by the author, depend 
fundamentally on the changes in the income distribution and in the over-all 
savings ratio through the business cycle. The over-all savings-income ratio r 
is a weighted average of the savings-income ratios r, and rz of the upper and 
lower income groups, respectively, r=w:ri+wer2. The weights w,; and wz are 
the fractions of total income going to the upper and lower income groups. 
Kuznet’s analysis is based on observations of r, r; and w, for various dates. 
The cyclical pattern of r is fairly well established, although its amplitude is 
not certain. The relative stability of r,; clearly exhibited in the data from 8 
cross-section surveys between 1929 and i948, is, to a considerable extent a 
necessary consequence of the structure of definitions. If individual savings- 
income ratios are positively correlated with income, they cannot vary as 
much from year to year among the upper as among the lower income groups; 
for ther¢ is an upper bound to the percentage of income that can be saved. 
On the other hand, there is no necessary lower limit, since savings by defini- 
tion can be negative. Literally hundreds of family expenditure surveys in 
this and other countries have confirmed the positive correlation between the 
savings-income ratio and current income. The counter-cyclical movement 
of w, the share of the upper income groups, is thus the generalization most 
wanting in statistical foundation. It is also the one most crucial for the au- 
thor’s conclusions-about cyclical fluctuations in the distribution of savings. 

This work is one of the first to pool information from many sources into an 
integrated analysis of the part played by different groups in the movement 
of national aggregates. It offers an excellent illustration of the long, arduous 
process of assembling and collating information that must precede the 
explicit formulation and testing of hypotheses. The data collected and the 
conclusions reached present the basic material for many new studies, ranging 
from empirical methodology to econometric theory. The author properly 
devoted a large fraction of his attention to problems of measurement. He 
examined, in considerable detail, the quantitative effects of alterations in 
the concepts basic to distribution statistics—the definition of the recipient, 
the definition of income, and the period over which income is accumulated. 
His discussion of these problems of concept challenges research in this field 
to deal with the year-to-year movements of individual units in relative 
economic position abstracted from the influence of demographic factors. 
The author suggests that an understanding of income variability will come 
through more elaborate annual summaries of the shares in income, con- 
sumption, and savings that will reveal the influence of the changing dis- 
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tribution of the population by such income determinants as occupation, 
age, and class of worker. Since such distribution statistics can probably be 
assembled only for scattered dates, they could serve to support this study 
only by stimulating progress in analytical methods that rely less on direct 
measurements of the relevant variables. The importance of original studies 
of this type lies, in no small measure, in the extent to which they relieve 
further research on the subject from the need for comprehensive data. 


Variations in Working Class Family Expenditure. J. L. Nicholson, Journal 
of the Royal Statistical Society, Series A (General), Part IV, 1949, pp. 359-418. 
Reprint, University of Oxford, Institute of Statistics. Two shillings and six 
pence. 


Dorortay S. Brapy, University of Illinois 


7 statistical study of family expenditures is in two parts, the first 
devoted to the relation between family expenditures and number of 
children, and the second to the development of a statistical method for 
defining equivalence in the standard of living among families of different 
size. The general problem of identifying the same standard of living in 
different situations has not received the attention from statisticians merited 
by its importance. Mr. Nicholson’s careful investigation is therefore to be 
welcomed as a contribution to our understanding of the problem and as an 
inspiration for further research. 

Equivalence in the standard of living among families of different size was 
defined by the expenditures for a set of goods and services used by the adult 
members of the household. The set was chosen by the requirement that total 
family expenditures should increase with the number of children at each level 
of expenditure on the selected goods and services. This condition will be 
satisfied when expenditures on the selected items decrease with the number of 
children when total family expenditures are held constant. In the data used 
from the working class family budget enquiry of the Ministry of Labor, 
1937-38, only the categories representing adult consumption in a strict 
sense were reduced as the number of children increased and total expendi- 
ture was fixed. These categories were adult’s clothing, tobacco and cigarettes, 
drink, and certain miscellaneous items. 

Mr. Nicholson attempted to devise a test for selecting the best combina- 
tion from these categories for his “index” of the standard of living by com- 
paring the differences in total clothing expenditures among families with 0, 
1 and 2 children with the observed expenditures on children’s clothing at 
the same value of the “index.” This comparison does not serve as a test for it 
is based on the assumption that equality in the combined expenditures for 
several groups of items implies equality in the expenditures for the separate 
groups. For a given outlay on all the items purchased for adults there was 
apparently a shift from clothing to other types of expenditure as the size 
of family increased, The devising of tests to use in connection with the de- 
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velopment of indexes is, however, an excellent idea, which should be in. 
corporated in other studies of this problem. 

Mr. Nicholson was handicapped in his investigation because the data 
were classified by total expenditures, a form of summarization that tends to 
obscure the changes in the spending of a given economic group that are 
associated with the number of children. The summarization appropriate 
for studies of this type offers a field for statistical research that has only 
recently been recognized. Mr. Nicholson’s concepts of an index of equiva- 
lence might be useful in the study of this general problem. 


Basic Data of the American Economy. W. Nelson Peach and Walter Krause. 
Third Edition. Chicago: Richard D. Irwin, Inc., 1950, Pp. xix. 248. $3.00 
(College $2.00). 


Peter O. STEINER, University of California (Berkeley) 


COMPILATION of charts and tables under such a title might serve either of 

two groups: (1) Economists, businessmen, and others who would like 
to have easily accessible a comprehensive collection of basic series otherwise 
available only after considerable bibliographic search; or (2) Students in 
elementary courses in economics, as a supplement to the conventional texts. 
This edition, like the earlier ones, is frankly aimed at the second group and 
any service rendered to the first group must thus be largely incidental. 

While it is not a fair criticism of the collection that this incidental result 
does not occur, it is regrettable. Most of the data are from a small number 
of major, readily available sources, such as the Survey of Current Business, 
The Economic Report of the President, and the Federal Reserve Bulletin, which 
also contain much data that are of interest but are not included in this 
volume. A careful search of the more obscure sources does not seem to have 
been made, and as a result an opportunity to present some of the important 
but repeatedly overlooked series has been missed. Thus, for example, 
Frickey’s Index of Industrial Production 1866-1914 might have been in- 
cluded as a complement to the 1919-1949 production data of the Federal 
Reserve Board Index of Industrial Production, Or, data for particular 
industries, such as the series on aluminum production 1890-1935 compiled 
from a variety of sources by Donald Wallace in his study of the aluminum 
industry, might usefully have been included. These examples are only sug- 
gestive; a great many careful investigators have developed interesting basic 
series—see, for one source, early numbers of The Review of Economic Sta- 
tistics—which would add to the value of a volume with this title. 

For classroom use the book has obvious merits, but also some limitations 
that seem avoidable. The remainder of this review will focus on some general 
comments on use of the volume for its intended purpose. 

The tables and charts (100-odd of each) are presented in nine sections: 
1. National Income; 2. Population and the Working Force in the United 
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States, Including Material on Wage Trends and Social Security; 3. Natural 
Resources; 4. Money and Banking; 5. International Trade and Finance; 6. 
Government Expenditures, Tax Collections, Public and Private Debt; 7. 
Price Levels and Economic Fluctuations; 8. Structure and Control of In- 
dustry; 9. Agriculture. 

A major omission seems to the reviewer to be the absence of a section on 
Economic Development, an area of major interest that is particularly sus- 
ceptible to graphic treatment, and for which ample data are available, but 
which is inadequately represented in the present selection. 

There are a series of general limitations that may be briefly noted. One is 
the concentration on aggregate statistics at the expense of various segmental 
or cross-sectional series, a deficiency that is particularly felt in the sections 
concerning economic fluctuations and structure of the economy, and which 
results in these being the weakest in the book. A second is the non-inclusion 
of any quarterly or monthly data. Not only does this prevent even passing 
reference to the evidence of seasonal variation, but it also destroys some 
useful information, particularly in connection with the monetary series 
and with the charts focusing on economic fluctuations. Another, minor, 
criticism is the exclusive reliance on arithmetic scales. The ratio scale is a 
device worth exposing the student to, and for some series it can aid ap- 
preciably in the interpretation of the data. 

Much of the above amounts to quarreling with the authors’ selection 
process. Perhaps the problem could be solved at least partially if they would 
abandon the conception of this volume as a selection of tables with high- 
lights of the tables presented graphically. If instead they presented a larger 
number of charts, with sources given but not always the figures themselves, 
room could be made for many additional series. This might also reduce the 
reticence about including longer series which rest on somewhat less con- 
tinuous data. 

However, the changes incorporated in the present edition are generally 
improvements, and since the book is designed for revision every second 
year, there is hope that future editions will become progressively more 
adequate. 

On the physical side the format is excellent and the flexible plastic binding 
sturdy and convenient. Obsolescence should precede deterioration by a good 
margin. 


Tables of Working Life—Length of Working Life for Men. Bulletin No. 1001. 
United States Department of Labor, Bureau of Labor Statistics, Washington 
D. C., 1950. Pp. 74. 


MortTIMER SPIEGELMAN, Metropolitan Life Insurance Company 


_— tables described in this monograph are designed as a technique for 
studying the dynamics of the labor force, chiefly in terms of age-specific 
rates of accession and separation and of average remaining years of labor 
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force participation. Results derived from the 1940 census are presented 
separately for white and nonwhite males and for total males in urban areas, 
rural areas, and the country as a whole; there is also a table for total males 
in 1947. 

The form of presentation bears a striking similarity to the usual marriage 
and mortality table and to the service table used in pension work. Such 
tables are generally constructed from observed series of age-specific acces- 
sion and separation rates. However, for the tables of this monograph, rates 
of this kind were not obtainable from census sources and an indirect pro- 
cedure was therefore adopted suited to the data available, namely the per 
cent of population in the labor force at single ages (here termed “worker 
rate”). It was assumed that if the age-specific worker rates remain constant 
over time (aside from seasonal variation), then “the differences between 
successive single-year worker rates at a given time may serve as a reasonable 
approximation of the net annual rates of labor force accession or separation 
between successive ages, after allowing for mortality.” 

Being based on a cross-section picture derived from a census, the tables 
have more value as a description of a current situation than for forecasting. 
This is clearly recognized in the statement that “The pattern of working life 
is continually changing. It is affected by trends in mortality and also by 
various long-term social and economic forces.” It is important to bear this 
and other limitations in mind, for otherwise there may be a temptation to 
use these tables rather indiscriminately. Such, indeed, has been the ex- 
perience with life tables generally, and particularly with the figures for ex- 
pectation of life. A special note of warning is given in the text that the figures 
for average remaining years of labor force participation, like those for 
expectation of life, represent only averages and are not applicable to any 
individual. The urban-rural tables of working life are also to be used with 
some caution for they do not take internal migration into account. The 
figures for average remaining years of labor force participation for young 
rural males read as if none of them will ever participate in the urban labor 
force; as a matter of fact, the rural to urban movement of population at 
the productive ages is a characteristic of American life. 

Two examples are given of applications of the tables. The first is in an 
estimate of male labor force accessions and separations during the decade 
1940-1950. Secondly, the tables are used to estimate the new jobs opened 
annually in selected occupations to replace vacancies caused by death and 
by other reasons of withdrawal. In gauging the reliability of these estimates, 
the limitations referred to in the previous paragraph present a vital con- 
sideration. 

There are two technical points worth mentioning. For the purpose of 
forming a life table for total males from available tables for white and 
nonwhite males, the survivorship column of each was weighted by ratios of 
white and of nonwhite male births to total male births. A table derived solely 
from total male experience would be approximated more closely if the age- 
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specific mortality rates for the two racial categories were weighted by the 
age-specific proportions of white and nonwhite males to total males in the 
population. The age-specific “worker rates” used in the computation of the 
table for total males apparently correspond to such a weighted average of 
mortality rates. Secondly, the explanation, on page 68, of the computation 
of accessions to the labor force needs clarifying. The values of Lw, used in 
the formula up to age 32 are not those in column 8 of Table la on page 60, 
but rather values of L,w, from the product of columns 4 and 7. 

Lastly, the use of the expression “average life expectancy” is somewhat 
naive. In mathematical writings, the terms mathematical expectation, average, 
and mean are used interchangeably. Since the expectation of life is a mathe- 
matical expectation, there is no need for the modifier average. 

This monograph is only an initial effort. It was prepared with a keen 
awareness of the limitations of the tables offered and also of the directions 
in which improvements may be sought. With this approach, the efforts are 
worth continuing, especially with the forthcoming 1950 census data. 


Progress and Economic Problems in Farm Mechanization. A. M. Acock and 
others. Washington, D. C.: Food and Agriculture Organization of the United 
Nations, 1950. Pp. vii, 88. $1.00. 


M. R. Coopsr, Bureau of Agricultural Economics 


sy report brings together for the first time information on the progress 


of farm mechanization in all parts of the world. It contains estimates of 
regional and world totals of draft power used in agriculture at stipulated 
times beginning with 1930. Units of animal and tractor draft power are 
shown separately, and these are combined into a total by using factors for 
converting numbers of horses, mules, buffalos, and draft cattle into equiva- 
lent tractor units. Progress in manufacture of tractors and other farm 
machinery and equipment, and in world trade, is shown by country when 
available, and by region. 

The author reviews these trends and discusses the major economic prob- 
lems associated with expanding the use of modern farm machinery. These 
problems vary greatly from country to country, but it seems apparent that 
there is room for some further expansion in mechanization in all countries. 
Generally, the greatest opportunities exist in countries at high levels of 
economic development. 

The data are based chiefly on official statistics, government reports to 
the Economic Commission for Europe, and information supplied to Food 
and Agriculture Organization working parties. Where published or official 
figures were not available, unofficial estimates have been provided. These 
seem reasonable, and undoubtedly increase the value of the report. 

The report was prepared for the use of government officials and others 
interested in developing programs of agricultural development in the various 
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countries. It is well written and shows good judgment in selecting data that 
provide an accurate and comparable picture of conditions and possibilities 
in the different regions. It is highly recommended. 


Rural Social Systems: A Textbook in Rural Sociology and Anthropology. 
Charles P. Loomis (Head, Department of Sociology and Anthropology, Michigan 
State College) and J. Allen Beegle (Associate Professor of Sociology and An- 
thropology, Michigan State College). New York: Prentice-Hall, Inc., 1950. Pp, 
xxvii, 873. $6.75. 


WituiaM H, Sewe tt, University of Wisconsin 


His is the first attempt to produce a systematic rural sociology text since 

the monumental Systematic Source Book in Rural Sociology by Sorokin, 
Zimmerman, and Galpin which was published in 1932. While it is not nearly 
as comprehensive as the earlier work, especially in its coverage of other than 
United States source materials, it covers the field in a more systematic 
fashion than any other textbook in rural sociology. Especially notable is its 
inclusion of materials from various anthropological and sociometric studies 
made in recent years in rural areas of the United States. 

The focus throughout the book is on rural social systems which are defined 
as “organizations whose members interact more with members than with 
nonmembers when operating to attain their objectives.” Components re- 
lated to structure, value orientation and locus are stressed throughout the 
analysis. The analysis of variations in value orientation of social systems is 
made by means of the ideal type constructs “familistic Gemeinschaft” and 
“contractual Gesellschaft.” The former is characterized by non-rational, 
sacred, traditional and emotional action, general rights and responsibilities 
of members, a relatively complete “community of fate,” and an integration 
of the roles of members both within and outside of the system, while the 
latter is characterized by rational action, limited responsibility of members, 
functionally specific action, limited responsibility, and the irrelevance of the 
roles of members in other systems to their roles within the system. An at- 
tempt is made to locate the rural social systems studied along a continuum 
ranging between these two polar types. Examples of the familistic Gemein- 
schaft and the contractual Gesellschaft are the Amish family and the 
Division of Extension and Training of the United States Department of 
Agriculture respectively. 

Using this analytical scheme, the following rural social systems are studied: 
the rural family, friendship groups, mutual-aid groups, cliques, the hamlet, 
neighborhood, trade-center community, rural regions, rural social strata, 
religious groups, the rural school, the rural library, rural government, 
farmers’ movements, the Extension Service, the Soil Conservation Service, 
rural health and medical care, and rural welfare and security. As might be 
expected, the application of the analytical scheme is more consistently and 
completely made in some of these areas than in others. In fact, in some cases 
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the analysis seems somewhat strained in light of the paucity of existing 
data. All-in-all, however, the authors have produced a scholarly and sys- 
tematic book which incorporates much of the research knowledge of rural 
social life in the United States. 

Statisticians should find this book useful, especially as a source of informa- 
tion on contemporary American rural social systems. A great many quantita- 
tive data about rural life are presented in its numerous tables and charts. 
This is not primarily a source book for quantitative information about rural 
life, however, but a book characterized by its integration of research results 
and systematic sociological theory. 


Local Community Fact Book of Chicago, Edited by Louis Wirth and Eleanor H. 
Bernert. Chicago, Ill.: University of Chicago Press, 1950. Pp. xii, 152. $2.50. 


Raymonp F. Sietro, Ohio State University 


uIs book brings together on two pages for each of Chicago’s 75 com- 

munities, as previously delineated in University of Chicago studies, in- 
formation drawn mainly from U. 8. Census Bureau reports for 1940 and 
1930, and from special tabulations by the Bureau for the University. Two 
pages presenting like information for the city as a whole, and an outline map 
of each community designating its boundaries, provide needed comparative 
data and reference points. 

Prepared under the auspices of the Chicago Community Inventory and 
with the aid of several research assistants, this compendium continues a 
series of like publications of the University extending back to E. W. Burgess’ 
and Charles Newcomb’s Census Data of the City of Chicago, 1930. It will 
largely replace an earlier publication bearing the same title edited by Louis 
Wirth and Margaret Furey, published in 1930. 

This work differs from the earlier one in several significant respects. 
Readers having an interest in indices of social welfare will miss current data 
on juvenile delinquency, infant mortality, illegitimate births, and the like 
that appeared in the earlier “fact book.” The editors in their Introduction 
indicate that these were omitted to make room for more census material. 
It is questionable, however, whether the space devoted to four population 
pyramids by nativity for each of the 75 communities could not be used more 
advantageously, if devoted to presenting more data not available in the 
U. 8S. Census. Perhaps the omission of information of primary interest to 
sociologists and social workers is compensated for by the inclusion of more 
facts of interest to market research agencies. 

The data available for each community within the limits of two pages are 
found in nine tables containing data on age, nativity, marital status, citizen- 
ship, and education by sex; type of dwelling structure, wages and salaries 
by race and sex, major occupations by sex, employment status by sex, 
dwelling units by year of construction, selected housing characteristics, 
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birth and death rates from the records of the Chicago Health Department, 
and selected characteristics of households. 

Facts about households include several interesting ratios and averages 
computed from census data. Among the ratios are those of number of 
children, servants, and lodgers to the number of private households. The 
justification for publishing such a volume, in addition to the convenience 
of assembling scattered data, would seem to be largely in the presentation of 
statistical values not to be found elsewhere. These computed characteristics 
of the households may prove to be of substantial worth to those interested in 
market research, who should find the publication of special value. It will 
also be useful to other readers wanting to find quickly the characteristics of 
Chicago communities. The editors are to be commended for having rendered 
so useful a service to other research workers. 


Public Opinion, 1935-1946, Under the editorial direction of Hadley Cantril, 
Prepared by Mildred Strunk. (Both of the Office of Public Opinion Research.) 
Princeton: Princeton University Press, 1951. Pp. lix, 1191. $25. 


HIs volume is described on its jacket as “monumental”; and it is quite 

literally as large as many a headstone. According to Mrs. Strunk’s 
Introduction, it “aims to present as many opinion poll results as possible, 
in a convenient and useful form. The material presented has been collected 
from 23 organizations in 16 countries and covers the period from 1935... 
through 1946.” It includes “most questions asked of national cross sections 
throughout the world”; market research material and local polls are not 
included. 

“In general, the classification and wording of both subjects and cross 

references are based on the Library of Congress subject headings. . . . Each 
question has been classified under the most specific subject that could be 
assigned to it; under each subject, questions have been arranged in chrono- 
logical order except where questions on a single topic have been assembled 
for presentation in tabular form, when the entire series appears under the 
date of the earliest question.” Classifications by sex, economic status, replies 
to other questions, etc., have been retained. 
. It proved impractical to indicate the exact size of the sample used for 
each question, but the “number of interviews usually conducted by each of 
the organizations” is given (pp. viii—ix), and is interesting in itself. No in- 
formation is given about sample designs, interviewing techniques, or other 
aspects of the methods by which the data were obtained. 

The sponsors regard the volume as a storehouse of information on opinion 
about literally every subject under the sun, which will be indispensable to 
anyone concerned with public reaction to current events. If it proves as 
useful as they expect, they plan to publish further volumes covering subse- 
quent five-year periods. W. A. W. 
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The Development of a Test for Selecting Research Personnel. Mary H. Weislogel, 
Pittsburgh, Pennsylvania: American Institute for Research, 1950. Pp. 33. 


Joun G. Darey, University of Minnesota 


nis brief monograph is entirely a descriptive report of one part of a test 
T sontenaiiad project being carried out under contract with the Man- 
power Branch, Human Resources Division, Office of Naval Research. This 
phase of the project sought to develop a test that would: select candidates 
for advanced scientific training or for placement as junior scientific workers; 
emphasize potentiality for, rather than proficiency in, research work; stand- 
ardize test items as predictors of specified “critical requirements” for re- 
search personnel. 

The report contains no basic statistical material by which the merit of 
the research can be judged. Technical appendices containing such material 
are listed as being separately published, but these were not made available 
for review. Furthermore, since security of test items must be maintained, it 


is impossible to evaluate the final form of the test as standardized. 

In the light of these restrictions and limitations, it is not possible to pre- 
pare a critical or meaningful review; nor will the potential readers of the re- 
port find in it material of more than descriptive import. 
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Faverge, J.-M. Introduction aux Mé- 
thodes Statistiques en Psychologie Appli- 
quée. Presses Universitaires de France, 
Paris. 1950. Paper. 

Federal Security Agency, Office of Edu- 
cation. Statistics of State School Systems, 
1947-48. GPO. 1950. Paper. 30 cents. 
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Schrock, Edward M. Quality Control and 
Statistical Methods. Reinhold Publishing 
Co., New York. 1950. $5.00. 

Spiegelman, Mortimer. Health Progress 
in the United States: A Survey of Recent 
Trends in Longevity. American Enterprise 
Association, Inc., New York. 1950. Paper. 
50 cents. 

Studies in Business and Economics, Vol. 
IV, No. 3. Measuring Newspaper Reader- 
ship. College of Business and Public Ad- 
ministration, University of Maryland, Col- 
lege Park. December, 1950. Paper. 

Tippett, L. H. C. Technological Applica- 
tions of Statistics. John Wiley and Sons, 
Inc. New York. 1950. $3.50. 

United Nations Studies in Methods No. 
1. Index Numbers of Industrial Production. 
Statistical Office of the U.N., Lake Succes, 
N. Y. 1950. Paper. 25 cents. 

U. S. Bureau of Labor Statistics. Tech- 
niques of Preparing Major BLS Statistical 
Series, Bulletin No. 993. GPO. 1950. 
Paper. 40 cents. 

U. 8S. Department of Labor. Handbook 
of Facts on Women Workers, Bulletin No. 
237. GPO. 1950. Paper. 30 cents. 

West Bengal, Government of. Statistics 
of Staff Employed under the Government of 
West Bengal (As on 3ist May, 1949). 
Alipore. 1950. Paper. 

Whyte, Lewis G. Principles of Finance 
and Investment, Volume II. Cambridge 
University Press, New York. 1951. $2.50. 

Yoder, Dale and L. Patricia Nelson. 
Personnel Salaries and Ratios in 1950. In- 
dustrial Relations Center, University of 
Minnesota. 1950. Paper. 





